Andriy Horen
Andriy Horen

Reputation: 2940

Implementing GetHashCode in C#. Null-value handling

Before I begin, all code samples here I tested on Mono environment and there is one noticeable difference in the GetHashCode implementations:

string.Empty.GetHashCode(); // returns 0 in Mono 3.10
string.Empty.GetHashCode(); // returns 757602046 in .NET 4.5.1

I made my implementation based on this SO Answer by @JonSkeet and in the comments he also suggests to use 0 hash code for NULL values (wasn't sure how should I hash them).

I usually use 0 as the effective hash code for null - which isn't the same as ignoring the field.

So having following implementation (Mono 3.10):

public class Entity {
    public int EntityID { get; set; }
    public string EntityName { get; set; }

    public override int GetHashCode() {
        unchecked {
            int hash = 15485863;       // prime number
            int multiplier = 1299709;  // another prime number

            hash = hash * multiplier + EntityID.GetHashCode();
            hash = hash * multiplier + (EntityName != null ? EntityName.GetHashCode() : 0);

            return hash;
        }
    }
}

It is quite easy to find collision e.g.

var hash1 = new Entity { EntityID = 1337, EntityName = "" }.GetHashCode();
var hash2 = new Entity { EntityID = 1337, EntityName = null }.GetHashCode();

bool equals = hash1 == hash2; // true

I could replace null-value 0 with some other number, however it won't fix the problem as there still is a chance that some hash(string) output will generate such number and I'll get another collision.

My question: How should I handle null values while using algorithm from example above?

Upvotes: 5

Views: 1963

Answers (2)

Yuval Itzchakov
Yuval Itzchakov

Reputation: 149518

My question: How should I handle null values while using algorithm from example above?

I don't think the problem is with null per-se. The problem lays in the fact you're using GetHashCode for equality, which it isn't meant for. GetHashCode should provide such hashes that aspire to normal distribution.

The docs say:

Two objects that are equal return hash codes that are equal. However, the reverse is not true: equal hash codes do not imply object equality, because different (unequal) objects can have identical hash codes.

And then goes on to specify the purpose of GetHashCode:

A hash code is intended for efficient insertion and lookup in collections that are based on a hash table.

You should be implementing IEquatable<Entity>, where you actually define the equivalence relation of two entities. And override != and == while you're at it.

An approximation:

public class Entity : IEquatable<Entity>
{
    public int EntityId { get; set; }
    public string EntityName { get; set; }

    public bool Equals(Entity other)
    {
        if (ReferenceEquals(null, other)) return false;
        if (ReferenceEquals(this, other)) return true;
        return EntityId == other.EntityId && 
               string.Equals(EntityName, other.EntityName);
    }

    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj)) return false;
        if (ReferenceEquals(this, obj)) return true;
        if (obj.GetType() != this.GetType()) return false;
        return Equals((Entity) obj);
    }

    public static bool operator ==(Entity left, Entity right)
    {
        return Equals(left, right);
    }

    public static bool operator !=(Entity left, Entity right)
    {
        return !Equals(left, right);
    }

    public override int GetHashCode()
    {
        unchecked
        {
            return (EntityId*397) ^ (EntityName != null ? EntityName.GetHashCode() : 0);
        }
    }
}

Upvotes: 4

Nitram
Nitram

Reputation: 6716

Your "problem" here is that you are trying the get collision free hash codes. While this is perfect for the lookup performance of collection implementations that use the hash code for lookup (e.g. HashSet and Dictionary) in the most cases this will not work.

The reason for that is that the hash code is just a 32-bit integer value and it represents data that is usually a lot bigger (multiple integer values, strings, etc.).

So the hash code is only there to define that two objects could be equal. The collection classes use the hash code to refine the area where the object is stored and use the equals function to find if two objects are really the same. For that reason you should always implement the Equals function for classes you implemented the hash code for. While those classes will fall back to the equals function of object, is it also a good idea to implement the IEquatable<T> interface to avoid typing problems of any kind (still overwrite the default equals method of Object!)

Upvotes: 4

Related Questions