Wayne Werner
Wayne Werner

Reputation: 51817

How does VB.NET decide what key is unique in Dictionary(Of)?

I have the following class:

Public Class Pair(Of T1, T2)

    Public Property First As T1
    Public Property Second As T2

    Public Sub New(Optional ByVal first As T1 = Nothing, Optional ByVal second As T2 = Nothing)
        Me.First = first
        Me.Second = second
    End Sub

    Public Overrides Function tostring() As String
        Return String.Format("<[{0},{1}]>", First, Second)
    End Function

    Public Overrides Function GetHashCode() As Integer         
        Return Integer.Parse(String.Format("{0}{1}", First.GetHashCode, Second.GetHashCode))
    End Function
End Class

However, when I create a dictionary using Pair as the key:

    Dim Pairs as Dictionary(Of Pair(Of Integer, Integer), String)

    Dim p = new Pair(of integer, integer)(1234, 13)
    Dim p2 = new Pair(of integer, integer)(1234, 13)

    console.writeline(String.Format("Hash 1:{0} Hash 2:{1}", p.gethashcode(), p2.gethashcode()))
    Pairs.add(p, "Hello")

    Console.WriteLine(Pairs(p2))

My expectation is that since both p and p2 have the hash code of 123413 they would hit the same dictionary element and that the WriteLine would display "Hello". What really happens, however, is that I get a KeyNotFoundException, which leads me to believe that the Dictionary (Of...) doesn't actually use the GetHashCode method.

So what do I need to do to make both of these Pairs refer to the same dictionary element?

Thanks!

Upvotes: 0

Views: 1773

Answers (3)

Paul Walls
Paul Walls

Reputation: 6044

A few things:

GetHashCode is used by the Dictionary to figure out where to store the key internally, but it's not an exact match scenario. In ideal circumstances, the hash code should map each key to a unique slot index, making lookups extremely quick.

In practice, values in a Dictionary are stored at an index in an array . With 2^32 different types of hash codes, it is infeasible to create an array index for each hash code, so the Dictionary transforms the hash code into an array index where the values are stored. Because of this, the Dictionary experiences what are called "hash collisions". This means that different keys will map to the same hash value.

Dictionary's are good at managing this, but ultimately when two or more hash codes create the same index (which will happen when the collection gets big enough), the Equals method must determine which key to use to locate the Key/Value pair that contains the value you are after. If Equals is false for all items in the bucket, then it returns the KeyNotFoundException you experienced.

On to the code:

While you could override Equals, I don't see why you need to. For starters, I would get rid of your GetHashCode. You will eventually have problems with it, as shown here:

Dim p = new Pair(of integer, integer)(Int32.MaxValue, Int32.MaxValue)
p.gethashcode() 'BOOM!!!

Instead, based on what you are doing here, I'd recommend that you convert your Pair class to a struct (Structure in VB), leaving Equals and GetHashCode alone. This is really only a good idea if you are assigning value types (int, byte, bool, etc...) to the Pair because of performance reasons. I would really consider this though.

If you have to have a class, create a representative key that returns a type that is suitable for the Dictionary. For instance, because KeyValuePair is a value type, it will be compared based on its value, not a reference.

Public Function GetKey() As KeyValuePair(Of T1, T2)        
    Return New KeyValuePair(Of T1, T2)(First, Second)
End Function

And your Dictionary becomes

Dim Pairs as Dictionary(Of KeyValuePair(Of Integer, Integer), String)
Pairs.add(p.GetKey(), "Hello")
Console.WriteLine(Pairs(p2.GetKey()))

(If there are any syntax errors, it is because I am not a VB programmer.)

Upvotes: 1

Enigmativity
Enigmativity

Reputation: 117084

You have a number of issues you have to deal with here.

To start with - you must also override Equals as GetHashCode is only meant to be a fast method to determine if two objects are not equal. It never is meant to tell you if the objects are equal. That's what Equals is for - it is meant to be the final check that two objects are equal and it could be much slower to compute than GetHashCode.

Here's an example. Say you have three very long strings, two that are the same length and the other different. If GetHashCode returned the length of the strings then you could very quickly determine that the third string is definitely not equal to the first two. You'd have to check the actual contents of the first two to see if they are equal and this could be a lengthy process comparatively.

The next thing, and this is equally important, you cannot have a hash code that changes during the lifetime of the object.

The key word is cannot. It will break things.

The Dictionary(Of ,) class uses a series of "buckets" to quickly find values based on the hash code of the key. So, if your key's hash code changes after it was added to the dictionary then the dictionary will not be able to find it and it will allow you to add the key twice!

Here's an example:

Dim d = New Dictionary(Of Pair(Of Integer, Integer), String)
Dim p = new Pair(Of Integer, Integer)(10, 11)
d.Add(p, "James")
Dim before = d(p) ' Found!
p.First = 12
Dim after = d(p) ' NOT Found!

Or this:

Dim d = New Dictionary(Of Pair(Of Integer, Integer), String)
Dim p = new Pair(Of Integer, Integer)(10, 11)
d.Add(p, "James")
p.First = 12
d.Add(p, "Tom") ' ALLOWED!

This is why mutable structs are bad.

I hope this helps.

Upvotes: 0

Jon Skeet
Jon Skeet

Reputation: 1500903

Having the same hash code isn't enough - the two keys need to be equal too, i.e. key1.Equals(key2) has to be true (or the equivalent under a custom comparer).

You haven't overridden Equals, so two Pair objects are always unequal.

(Note that your hash code function will also fail in various ways, such as if they're both negative. Why not just combine the two integer values in some way?)

I don't know VB well enough to come up with the suitable override myself when I ought to be going to bed, but in C# it would be something like:

public override bool Equals(object other)
{
    if (other == null)
    {
        return false;
    }
    if (other.GetType() != this.GetType())
    {
        return false;
    }
    var otherPair = (Pair<T1, T2>) other;
    return EqualityComparer<T1>.Default(this.First, otherPair.First) &&
           EqualityComparer<T2>.Default(this.Second, otherPair.Second);
}

(I'd use EqualityComparer<T>.Default for the hash code generation too, by the way.)

Upvotes: 3

Related Questions