Vikram
Vikram

Reputation: 1627

Why is Equals() being not called for the all objects while adding to collection

I have a type which I am using as key in the IDictionary. The type is as following

public  class Employee
{
    public string Name { get; set; }
    public int ID { get; set; }

    public override bool Equals(object obj)
    {
        Employee emp = obj as Employee;
        if (emp != null)
            return emp.Name.Equals(this.Name);
        return false;
    }

    public override int GetHashCode()
    {
        return this.Name.GetHashCode();
    }
}

Now I have created a dictionary as following in my main as following

 IDictionary<Employee, int> empCollection = new Dictionary<Employee, int>();
        Employee emp1 = new Employee() { Name = "abhi", ID = 1 };
        Employee emp2 = new Employee() { Name = "vikram", ID = 2 };
        Employee emp3 = new Employee() { Name = "vikram", ID = 3 };

        empCollection.Add(emp1, 1);
        empCollection.Add(emp2, 2);
        empCollection.Add(emp3, 3);

Now while debugging I found out that when emp1 is added to the collection only GetHashCode method is called of the key type, after that when emp2 is added to the collection only GetHashCode method is called again but in the case of emp3 both GetHashCode and Equals methods are called.

May be it looks too naive being asking this question but why isn't Equals method not called when eqImp2 object is added to collection. What is happening inside. Please explain.

Upvotes: 13

Views: 1248

Answers (5)

Sam Harwell
Sam Harwell

Reputation: 99889

If you continue this experiment, you'll observe some behavior which is specific to the Dictionary<TKey, TValue> implementation, and some behavior that is required due to the way you implemented GetHashCode.

First, it's important to understand the role of GetHashCode and Equals when comparing objects for equality. Additional information is available on this question, but I'll repeat the basic rules here:

  1. The Equals method establishes exactly which objects are equal and which objects are not. All necessary checks need to be performed in this method for a final determination before returning.
  2. A hash code is a value calculated from the value of your object. Typically it is much smaller than the original object (in our case the hash code is a 4 byte integer) and not necessarily unique. However it is much faster to compute and compare to each other than the original objects themselves.
    • When hash codes do not need to be unique, different hash codes indicate different objects (i.e. Equals will definitely return false), but equal hash codes do not mean anything (i.e. Equals could return true or false).

Collections which associate values with a key object (e.g. IDictionary<TKey, TValue> in .NET, or Map<K, V> in Java) take advantage of the hash codes to improve implementation efficiency. However, since the documentation for Object.GetHashCode specifically does not require the results to be unique, these collections cannot rely on the hash codes alone for proper functionality. When two objects have the same hash code, only a call to Equals can distinguish them. The case you describe for the insertion of emp3 falls into this case: the [IDictionary<TKey, TValue>.Add] method needs to throw an ArgumentException if you are trying to insert the same value, and only a call to Equals can determine if the new key emp3 is equal to the previously inserted emp3.

Additional implementation characteristics

The particular collection implementation may result in more calls to GetHashCode than you anticipate. For example, when the internal storage of a hash table is resized, an implementation might call GetHashCode for every object stored in the collection. Collections based on a binary- or B-tree might only call GetHashCode once (if the results are cached in the tree structure), or might need to call GetHashCode for multiple objects during every insertion or lookup operation (if the results are not cached).

Sometimes hash table implementations need to call GetHashCode for multiple objects, or perhaps even Equals for objects with different hash codes due to the way they use modulo arithmetic to place keys into "buckets". The specific characteristics of this vary from one implementation to the next.

Upvotes: 1

misha
misha

Reputation: 2889

In your example the GetHashCode looks at the Name hash code. emp3 has the same name as emp2 , ("vikram"). They are equal given the hash code so it further looks using Equals.

Upvotes: 1

Pete
Pete

Reputation: 6723

emp2 and emp3 have the same key. This will cause a "key collision" in the dictionary. It first called GetHashCode() and determined the hash codes were the same. It then ensures they're the same by calling Equals(). The code from Dictionary is:

int num = this.comparer.GetHashCode(key) & 2147483647;
...
if (this.entries[i].hashCode == num && this.comparer.Equals(this.entries[i].key, key))

Obviously, if the hashcodes don't match, it never has to call Equals.

You should get a tool like ILSpy and then you can look at the code and find the answer yourself.

Upvotes: 1

Luis Filipe
Luis Filipe

Reputation: 8708

That is because GetHashCode is a shortcut. C# will first call GetHashCode which is supposed to be fast executing. If two objects have different HashCodes then there is no need to call the, assumingly, more expensive Equals method. Only if they have the same HashCode then it will call Equals. That is because HashCode is not guaranteed to be unique

Upvotes: 0

Jon
Jon

Reputation: 437386

The dictionary and all other similar containers use the hashcode as a quick-and-dirty check: different hashcodes mean that two objects are definitely not equal; identical hashcodes do not mean anything. The documentation of GetHashCode specifies this behavior by saying

If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.

Your emp1 and emp2 generate different hashcodes, so the dictionary does not need to run Equals; it already knows they are not equal. On the other hand, emp2 and emp3 generate the same hashcode so the dictionary must call Equals to definitely determine if they are indeed equal, or if the identical hashcode was just the result of chance.

Upvotes: 9

Related Questions