saegeoff
saegeoff

Reputation: 561

GetHashCode method with Dictionary and HashSet

I have a question about how the Dictionary and HashSet work in C#. According to my understanding, GetHashCode is used in hash tables to determine key uniqueness.

On the following MSDN page, it states:

A hash code is a numeric value that is used to insert and identify an object in a hash-based collection such as the Dictionary class, the Hashtable class, or a type derived from the DictionaryBase class.

Link: MSDN Object.GetHashCode

If that is the case, why does ContainsKey and Contains return false for car2 when it has the same hashcode as car1? If my understanding is correct and if what MSDN says is correct, shouldn't both of those return true?

class Program
{
    static void Main(string[] args)
    {            
        // Create a Dictionary and HashSet
        Dictionary<Car, int> carDictionary = new Dictionary<Car, int>();
        HashSet<Car> carSet = new HashSet<Car>();

        // Create 3 Cars (2 generic and 1 Civic)
        Car car1 = new Car();
        Car car2 = new Car();
        Car car3 = new Civic();

        // Test hash values
        int test1 = car1.GetHashCode(); // 22008501
        int test2 = car2.GetHashCode(); // 22008501
        int test3 = car3.GetHashCode(); // 12048305


        // Add 1 generic car and 1 Civic to both Dictionary and HashSet
        carDictionary.Add(car1, 1);
        carDictionary.Add(car3, 1);
        carSet.Add(car1);
        carSet.Add(car3);

        // Why are both of these false?
        bool dictTest1 = carDictionary.ContainsKey(car2);  // false
        bool setTest1 = carSet.Contains(car2); // false

        // Testing equality makes sense
        bool testA = car1.Equals(car2); // false
        bool testB = car1.Equals(car3); // false
    }
}

class Car
{
    public override int GetHashCode()
    {
        return 22008501;
    }
}

class Civic : Car
{
    public override int GetHashCode()
    {
        return 12048305;
    }
}

Upvotes: 5

Views: 7626

Answers (3)

Scott Chamberlain
Scott Chamberlain

Reputation: 127553

Because the logic of ContainsKey is similar to this.

//This is a simplified model for answering the OP's question, the real one is more complex.
private List<List<KeyValuePair<TKey,TValue>>> _buckets = //....

public bool ContainsKey(TKey key)
{
    List<KeyValuePair<TKey,TValue>> bucket = _buckets[key.GetHashCode() % _buckets.Length];
    foreach(var item in bucket)
    {
        if(key.Equals(item.Key))
            return true;
    }
    return false;
}

All GetHashCode does is get the bucket your key would go in, it still must go through each member of that bucket and find the exact match via the Equals method. That is why having good hash codes is important, the less items in a bucket the faster the foreach part will be. The best possible hashcode will have only one item per bucket.


Here is the actual code for Contains on a HashSet (Dictionary's ContainsKey is very similar but more complex)

private int[] m_buckets;
private Slot[] m_slots;

public bool Contains(T item) {
    if (m_buckets != null) {
        int hashCode = InternalGetHashCode(item);
        // see note at "HashSet" level describing why "- 1" appears in for loop
        for (int i = m_buckets[hashCode % m_buckets.Length] - 1; i >= 0; i = m_slots[i].next) {
            if (m_slots[i].hashCode == hashCode && m_comparer.Equals(m_slots[i].value, item)) {
                return true;
            }
        }
    }
    // either m_buckets is null or wasn't found
    return false;
}

private int InternalGetHashCode(T item) {
    if (item == null) {
        return 0;
    } 
    return m_comparer.GetHashCode(item) & Lower31BitMask;
}

internal struct Slot {
    internal int hashCode;      // Lower 31 bits of hash code, -1 if unused
    internal T value;
    internal int next;          // Index of next entry, -1 if last
}

Upvotes: 11

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476594

The hashcodes don't have to be guaranteed to be unique, they must be equal if the keys are equal.

Now what happens is that the items are stored in buckets. If you query whether a Dictionary<TK,TV> contains a given key or the HashSet<T> a given item, it will first calculate the hashcode to fetch the correct bucket.

Next it will iterate over all items in the bucket and perform .Equals tests on it. Only in case one of these matches, it will return true.

In other words, one is allowed to return the same hashcode for every instance although the instances are different. It only makes the hashing inefficient.

C# thus stores a Dictionary<TK,TV> like:

+----------+
| 22008501 |---<car1,1>----<car3,1>----|
+----------+
| 11155414 | (other bucket)
+----------+

With on the left side (possible bucket labels), although for small Dictionary's, the number of buckets will be very small, and an operations will be performed on the hash (for instance a modulo), to make the number of outcomes smaller.

Now if you query whether car2 is in the Dictionary, it will calculate the hash, and thus take the first bucket. Then it will iterate, and perform an equality check on car1 vs car2, next car3 vs car2 and it will reach the end of the bucket and return false. This is because the default Equals operation is reference equality. Only if you override that too, (for instance all cars are the same, you can return true).

Upvotes: 3

recursive
recursive

Reputation: 86084

As you noticed, car1.Equals(car2) isn't true. Dictionary and Hashset membership will only be true for objects that are equal. That means .Equals() returns true. This is only tested if their hashcodes are first found to be equal.

Upvotes: 0

Related Questions