Joan Venge
Joan Venge

Reputation: 330962

Hashtable doubling?

I don't know if the title makes sense, but I am wondering how a hashtable enlarges when you add items to it?

Is it like the List<T> where it doubles in size when the limit is reached? If so, then does this doubling recreates the collection from scratch (this can be answered for List<T> too, since I am not sure if that's what it does)?

Lastly if it indeed recreates it from scratch, then this particular Add operation would be very expensive to the user who wouldn't know that the limit is reached, right?

Upvotes: 2

Views: 1121

Answers (6)

Jon Skeet
Jon Skeet

Reputation: 1500455

I believe both Hashtable and Dictionary<TKey, TValue> expand to the next prime number after doubling the current count, e.g. 31 to 67.

As I understand it, a resize doesn't involve recomputing the hashes (as they're stored with the entries) but involves putting each entry into its new bucket, where the bucket number is based on both the hash code and the bucket count.

You asked about List<T> - there it's really simple. The list is backed by an array, and you just need to create a new array with the right size, and copy the contents of the current array. Something like:

private void Resize(int newCapacity)
{
    T[] tmp = new T[newCapacity];
    Array.Copy(backingArray, tmp, backingArray.Length);
    backingArray = tmp;
}

Upvotes: 5

Jason Coyne
Jason Coyne

Reputation: 6636

The sizes are not always doubled, but have variable growth depending on the number of items.

For a list, this is not nearly as expensive as re-creating a string or array for example, since only the pointers need to be copied from one list to another, and this can be done very efficiently.

for a hashtable/dictionary the items must be redistributed, and that can be very expensive. Best to initialize your hashtable with an estimated size ahead of time.

Upvotes: 0

Varkhan
Varkhan

Reputation: 16761

It all depends on your hash implementation of course.

Some hashes double, some change their size to an other arbitrary size (for instance, the next prime number).

Most hashes will need a rehash after changing their buffer size, which is "just" moving pointers around, but is still linear with the hash size. However, some hashes use consistent hashing, which reduces the need to move elements around (typically, only one small fraction of the elements will need to be moved).

Upvotes: 0

J.W.
J.W.

Reputation: 18181

why not dig into reflector to do some research if interested:

private void Insert(object key, object nvalue, bool add)
{
    uint num;
    uint num2;
    if (key == null)
    {
        throw new ArgumentNullException("key", Environment.GetResourceString("ArgumentNull_Key"));
    }
    if (this.count >= this.loadsize)
    {
        this.expand();
    }
    else if ((this.occupancy > this.loadsize) && (this.count > 100))
    {
        this.rehash();
    }
    uint num3 = this.InitHash(key, this.buckets.Length, out num, out num2);
    int num4 = 0;
    int index = -1;
    int num6 = (int) (num % this.buckets.Length);
Label_0071:
    if (((index == -1) && (this.buckets[num6].key == this.buckets)) && (this.buckets[num6].hash_coll < 0))
    {
        index = num6;
    }
    if ((this.buckets[num6].key == null) || ((this.buckets[num6].key == this.buckets) && ((this.buckets[num6].hash_coll & 0x80000000L) == 0L)))
    {
        if (index != -1)
        {
            num6 = index;
        }
        Thread.BeginCriticalRegion();
        this.isWriterInProgress = true;
        this.buckets[num6].val = nvalue;
        this.buckets[num6].key = key;
        this.buckets[num6].hash_coll |= (int) num3;
        this.count++;
        this.UpdateVersion();
        this.isWriterInProgress = false;
        Thread.EndCriticalRegion();
    }
    else if (((this.buckets[num6].hash_coll & 0x7fffffff) == num3) && this.KeyEquals(this.buckets[num6].key, key))
    {
        if (add)
        {
            throw new ArgumentException(Environment.GetResourceString("Argument_AddingDuplicate__", new object[] { this.buckets[num6].key, key }));
        }
        Thread.BeginCriticalRegion();
        this.isWriterInProgress = true;
        this.buckets[num6].val = nvalue;
        this.UpdateVersion();
        this.isWriterInProgress = false;
        Thread.EndCriticalRegion();
    }
    else
    {
        if ((index == -1) && (this.buckets[num6].hash_coll >= 0))
        {
            this.buckets[num6].hash_coll |= -2147483648;
            this.occupancy++;
        }
        num6 = (int) ((num6 + num2) % ((ulong) this.buckets.Length));
        if (++num4 < this.buckets.Length)
        {
            goto Label_0071;
        }
        if (index == -1)
        {
            throw new InvalidOperationException(Environment.GetResourceString("InvalidOperation_HashInsertFailed"));
        }
        Thread.BeginCriticalRegion();
        this.isWriterInProgress = true;
        this.buckets[index].val = nvalue;
        this.buckets[index].key = key;
        this.buckets[index].hash_coll |= (int) num3;
        this.count++;
        this.UpdateVersion();
        this.isWriterInProgress = false;
        Thread.EndCriticalRegion();
    }
}

Upvotes: 0

Ben S
Ben S

Reputation: 69342

From the MSDN page on Hashtable.Add():

If Count is less than the capacity of the Hashtable, this method is an O(1) operation. If the capacity needs to be increased to accommodate the new element, this method becomes an O(n) operation, where n is Count.

Since List has the same remark I would assume they work similarly internally regarding their memory allocation.

Upvotes: 0

Lucero
Lucero

Reputation: 60190

The hashtable works by using buckets, which can hold several items each (at least in most implementations, there are some that reuse other buckets in case of already used buckets). The number of buckets is usually a prime number, so that dividing the hashcode by the number of buckets returns an acceptable distribution for "good" hashes.

Usually, there is a certain fill factor which triggers the addition of more buckets and therefore the rebuild of the hashtable. Since the hashes are divided by the bucket count, the instances need to be redistributed according to their new bucket index, which is basically a recreate from scratch.

For the .NET hashtable, you can specify the "load factor" in some constructors. From MSDN:

The load factor is the maximum ratio of elements to buckets. A smaller load factor means faster lookup at the cost of increased memory consumption. A load factor of 1.0 is the best balance between speed and size.

Upvotes: 1

Related Questions