brdmllr
brdmllr

Reputation: 43

Handling collections in GetHashCode implementation

I'm working on implementing GetHashCode() based on the HashCode struct in this answer here. Since my Equals method will consider collections using Enumerable.SequenceEqual(), I need to include the collections in my GetHashCode() implementation.

As a starting point, I'm using Jon Skeet's embedded GetHashCode() implementation to test the output of the HashCode struct implementation. This works as expected using the following test below -

private class MyObjectEmbeddedGetHashCode
{
    public int x;
    public string y;
    public DateTimeOffset z;

    public List<string> collection;

    public override int GetHashCode()
    {
        unchecked
        {
            int hash = 17;

            hash = hash * 31 + x.GetHashCode();
            hash = hash * 31 + y.GetHashCode();
            hash = hash * 31 + z.GetHashCode();

            return hash;
        }
    }
}

private class MyObjectUsingHashCodeStruct
{
    public int x;
    public string y;
    public DateTimeOffset z;

    public List<string> collection;

    public override int GetHashCode()
    {
        return HashCode.Start
            .Hash(x)
            .Hash(y)
            .Hash(z);
    }
}

[Test]
public void GetHashCode_CollectionExcluded()
{
    DateTimeOffset now = DateTimeOffset.Now;

    MyObjectEmbeddedGetHashCode a = new MyObjectEmbeddedGetHashCode() 
    { 
        x = 1, 
        y = "Fizz",
        z = now,
        collection = new List<string>() 
        { 
            "Foo", 
            "Bar", 
            "Baz" 
        } 
    };

    MyObjectUsingHashCodeStruct b = new MyObjectUsingHashCodeStruct()
    {
        x = 1,
        y = "Fizz",
        z = now,
        collection = new List<string>() 
        { 
            "Foo", 
            "Bar", 
            "Baz" 
        }
    };

    Console.WriteLine("MyObject::GetHashCode(): {0}", a.GetHashCode());
    Console.WriteLine("MyObjectEx::GetHashCode(): {0}", b.GetHashCode());

    Assert.AreEqual(a.GetHashCode(), b.GetHashCode());
}

The next step is to consider the collection in the GetHashCode() calculation. This requires a small addition to the GetHashCode() implementation in MyObjectEmbeddedGetHashCode.

public override int GetHashCode()
{
    unchecked
    {
        int hash = 17;

        hash = hash * 31 + x.GetHashCode();
        hash = hash * 31 + y.GetHashCode();
        hash = hash * 31 + z.GetHashCode();

        int collectionHash = 17;

        foreach (var item in collection)
        {
            collectionHash = collectionHash * 31 + item.GetHashCode();
        }

        hash = hash * 31 + collectionHash;

        return hash;
    }
}

However, this is a little bit more difficult in the HashCode struct. In this example, when a collection of type List<string> is passed into the Hash<T> method, T is List<string> so trying to cast obj to ICollection<T> or IEnumerable<T> doesn't work.

I can successfully cast to IEnumerable, but it causes boxing and I found I have to worry about excluding types like string that implement IEnumerable.

Is there a way to reliably cast obj to ICollection<T> or IEnumerable<T> in this scenario?

public struct HashCode
{
    private readonly int hashCode;

    public HashCode(int hashCode)
    {
        this.hashCode = hashCode;
    }

    public static HashCode Start
    {
        get { return new HashCode(17); }
    }

    public static implicit operator int(HashCode hashCode)
    {
        return hashCode.GetHashCode();
    }

    public HashCode Hash<T>(T obj)
    {
        // I am able to detect if obj implements one of the lower level
        // collection interfaces. However, I am not able to cast obj to
        // one of them since T in this case is defined as List<string>,
        // so using as to cast obj to ICollection<T> or IEnumerable<T>
        // doesn't work.
        var isGenericICollection = obj.GetType().GetInterfaces().Any(
            x => x.IsGenericType && 
            x.GetGenericTypeDefinition() == typeof(ICollection<>));

        var c = EqualityComparer<T>.Default;

        // This works but using IEnumerable causes boxing.
        // var h = c.Equals(obj, default(T)) ? 0 : ( !(obj is string) && (obj is IEnumerable) ? GetCollectionHashCode(obj as IEnumerable) : obj.GetHashCode());

        var h = c.Equals(obj, default(T)) ? 0 : obj.GetHashCode();
        unchecked { h += this.hashCode * 31; }
        return new HashCode(h);
    }

    public override int GetHashCode()
    {
        return this.hashCode;
    }
}

Upvotes: 4

Views: 5906

Answers (1)

Peter Duniho
Peter Duniho

Reputation: 70661

You can address the collection issue in a couple of ways:

  1. Use a non-generic interface, e.g. ICollection or IEnumerable.
  2. Add an overload for the Hash() method, e.g. Hash<T>(IEnumerable<T> list) { ... }

That said, IMHO it would be better to just leave the struct HashCode alone and put the collection-specific code in your actual GetHashCode() method. E.g.:

public override int GetHashCode()
{
    HashCode hash = HashCode.Start
        .Hash(x)
        .Hash(y)
        .Hash(z);

    foreach (var item in collection)
    {
        hash = hash.Hash(item);
    }

    return hash;
}

If you do want a full-featured version of the struct HashCode type, it looks to me as though that same page you referenced has one: https://stackoverflow.com/a/2575444/3538012

The naming of the members is different, but it's basically the same idea as the struct HashCode type, but with overloads for other complex types (as in my suggestion #2 above). You could use that, or just apply the techniques there to your implementation of struct HashCode, preserving the naming conventions used in it.

Upvotes: 5

Related Questions