Pritam Karmakar
Pritam Karmakar

Reputation: 2801

How hashtable track the existing key index when it resize?

I was wondering how hashtable find the correct index when it increase it's capacity. For example let's assume I have a hashtable with default capacity 10. Now we have to add (key,value) pair [14,"hello 1"]

The index that we will get for above key '14' using below index mechanism is '4'. So hashtable going to save this (key,value) pair inside the index 4.

int index = key.GetHashCode() % 10

Now we keep on adding items into the hashtable and it reaches to the load factor. So it's time to resize. And let's assume hastable resize to 20.

Now I'm going to search my old key '14' into this hashtable. And as per the index mechanism now I will get the index for this key as 14. So I will start searching into the hashtable from index 14 but ideally it is in index 4.

So my question is how hashtable track the existing key index when it resize? Or does hashtable rehash all existing keys when it resize?

Upvotes: 2

Views: 1357

Answers (2)

JoshVarty
JoshVarty

Reputation: 9426

I've looked through the Shared Source CLI implementation for .Net and it looks like the entries are rehashed upon expansion. However, it is not necessary to recompute the HashCode with .GetHashCode().

If you look through the implementation you'll see the expand() method in which the following steps occur:

  1. A temporary bucket array is created and sized to the smallest prime greater than double its current size.
  2. The new array is populated by rehashing from the old bucket array.

.

for (nb = 0; nb < oldhashsize; nb++)
{
    bucket oldb = buckets[nb];
    if ((oldb.key != null) && (oldb.key != buckets))
    {
        putEntry(newBuckets, oldb.key, oldb.val, oldb.hash_coll & 0x7FFFFFFF);
    }
}



private void putEntry (bucket[] newBuckets, Object key, Object nvalue, int hashcode)
{
    BCLDebug.Assert(hashcode >= 0, "hashcode >= 0");  // make sure collision bit (sign bit) wasn't set.

    uint seed = (uint) hashcode;
    uint incr = (uint)(1 + (((seed >> 5) + 1) % ((uint)newBuckets.Length - 1)));

    do 
    {
        int bucketNumber = (int) (seed % (uint)newBuckets.Length);

        if ((newBuckets[bucketNumber].key == null) || (newBuckets[bucketNumber].key == buckets)) 
        {
            newBuckets[bucketNumber].val = nvalue;
            newBuckets[bucketNumber].key = key;
            newBuckets[bucketNumber].hash_coll |= hashcode;
            return;
        }
        newBuckets[bucketNumber].hash_coll |= unchecked((int)0x80000000);
        seed += incr;
        } while (true);
    }
}

The new array has been built and will be used in subsequent operations.

Also, from MSDN regarding Hashtable.Add():

If Count is less than the capacity of the Hashtable, this method is an O(1) operation. If the capacity needs to be increased to accommodate the new element, this method becomes an O(n) operation, where n is Count.

Upvotes: 1

Kirk Woll
Kirk Woll

Reputation: 77546

You might want to read up on hash tables, but the concept I think you're missing is this:

  • For a given key, say "asdf", there is a given 32-bit int hash code.
  • To get the position within indexed storage, you apply a modulus (%) of hashCode % length -- so if you grow your table from 10 to 20, the result changes to a new index. Implementations will of course go and make sure each existing entry is in the proper bucket in the new table.

Upvotes: 2

Related Questions