## Data structures, Hash tables

on , , , 8 minutes reading

Well, time to go back to data structures. So far we had seen queues, stacks and linked lists and assumed you are already familiar with arrays and vectors we can move to the next linear data structure in the list of basic structures that we should know, the amazing hash tables.

At simple sight a hash table is very similar to the bucket sorting we already used, we have a linear structure divided in buckets and basically when we need to add a new element we calculate its key using a formula/function based in the value we need to store. This will make the whole process a $$\mathcal{O}(1)$$ operation for adding and searching the value most of the times and that makes them a fantastic data structure for searching data that is constantly accessed (and avoiding linear or binary search).

### The hash function

We can represent a hash table with a simple array of length $$n$$, and use a simple formula $$k = n \mod l$$ to calculate the place in the array ($$k$$) of length $$l$$. In reality hash functions are a lot more complex than this and calculate an effective hash is not an easy task. If you had used or coded in .Net or Java before, you are probably aware that every object has a hash function of itself (GetHashCode and hashCode in .Net and Java), this is to know what would be the correct hash code to place the element in a hash table. With this simple formula placing the value $$12$$ in a hash table with an internal length of $$7$$ would be in the index $$5 \equiv 12 \pmod 7$$. We know if we need to look for the value again we just calculate the hash and check if it is in that “bucket”.

But what would happen if there is already another value in the same bucket? for example, 19 and 12 have the same modulo ($$19 \pmod 7 \equiv 12 \pmod 7$$)? This will lead to what we call a collision and we have to option to resolve such collision. There are many ways to do this and the easiest is simply not solving it and failing if there is an element in the hash table already.

To simplify the development, we can create a base class for hash tables and implement the “no collision” hash table:

abstract class Hash() {
protected open var buckets: Array<Int?> = arrayOfNulls(7)
protected open fun hash(value: Int): Int = value % buckets.size

open fun exists(value: Int): Boolean {
val key = hash(value)
return buckets[key] == value
}

val size: Int
get() = buckets.size
}


Because we don’t allow collisions, we simply throw an exception:

class HashCollisionException(message: String): Exception(message)

class CollisionHash(size: Int = 7): Hash() {
val key = hash(value)
if (buckets[key] != null && buckets[key] != value)
throw HashCollisionException("The value \$value produces a collision in the hash")
buckets[key] = value
}
}


Simple to test as well:

class CollisionHashTests {
@Test
fun In an empty hash nothing wrong happens() {
val hash = CollisionHash()

assertTrue { hash.exists(12) }
assertTrue { hash.exists(2) }
}

@Test
fun If the bucket is already occupied we throw() {
val hash = CollisionHash()

// 19 produces the same hash as 12
}

@Test
fun If the bucket has already the same element nothing happens() {
val hash = CollisionHash()

assertTrue { hash.exists(12) }
}
}


While this is the easiest to implement it is as well the most unpractical to use, I mean, we will end up with a very small hash table that basically cannot keep more than just a small bunch of elements. Another collision resolution technique is extend and rehash and it is pretty much as the name says, you extend the inner container and then rehash all the elements and repeat until there are no collisions, this is one of the most used collision resolution in simple hash tables:

class ExtendHash(val growRate: Double = 1.5): Hash() {
private fun rehash() {
val newSize = (size * growRate).toInt()
val items = buckets.copyOf()

buckets = arrayOfNulls(newSize)
}

val key = hash(value)
if (buckets[key] != null && buckets[key] != value) {
rehash()
}
else buckets[key] = value
}
}


The tests are not really that complicated:

class ExtendHashTests {
@Test
fun In an empty hash nothing wrong happens() {
val hash = CollisionHash()

assertTrue { hash.exists(12) }
assertTrue { hash.exists(2) }
}

@Test
fun If the bucket is already occupied we extend and rehash() {
val hash = ExtendHash()
val initial = hash.size