dark_ruby
dark_ruby

Reputation: 7866

LINQ join is not using Equals from provided EqualityComparer, uses GetHashCode instead

when passing EqualityComparer as last parameter to Linq Join method it is not using Equals method of it, it for some reason is using GetHashCode to compare items.

Is it possible to make it use Equals instead?

        var ss = new string[] { "aa", "bb", "cc" };
        var zz = new string[] { "aa", "zz", "cc" };

        var res = ss
            .Join(zz, 
                o => o, 
                i => i, 
                (i, o) => i + o, 
                new GenericEqualityComparer<String>((x,y) => x == y))
            .ToList();

Upvotes: 0

Views: 434

Answers (2)

Gert Arnold
Gert Arnold

Reputation: 109117

When an IEqualityComparer<T> compares to objects, it first compares their hashcodes. Only if they are equal the Equals method is used to refine the comparison. So in your case it should at least hit Equals twice.

To demonstrate what an EqualityComparer does I made a little code snippet in Linqpad:

void Main()
{
    var ss = new string[] { "aa1", "bb1", "cc1" };
    var zz = new string[] { "aa2", "aa3", "zz2", "cc2" };

    var res = ss.Join(zz,  o => o, i => i, (i, o) => i + o,
        new SubstringComparer()).ToList();
}

public class SubstringComparer : IEqualityComparer<string>
{
    public bool Equals(string left, string right)
    {
        string.Format("{0} - {1}", left, right).Dump();
        return left.Substring(0,2) == right.Substring(0,2);
    }

    public int GetHashCode(string value)
    {
        value.Dump();
        return value.Substring(0,2).GetHashCode();
    }
}

So strings are equal if their first two characters are equal. The output is:

aa2
aa3
aa2 - aa3
zz2
cc2
aa1
aa2 - aa1
bb1
cc1
cc2 - cc1

And the resulting list:

aa1aa2
aa1aa3
cc1cc2

You see that first the second list is compared (I'm not sure why, by the way, maybe the hashcodes are cached) and then the pairs.

So when your GenericEqualityComparer never hits Equals it somehow always generates a unique hashcode, which I think should be a bug. If it not always uses Equals, here is the explanation. And if you want a comparer to always use Equals you should make it always return an identical hashcode (which is inefficient, of course).

Upvotes: 1

AlanT
AlanT

Reputation: 3663

https://stackoverflow.com/a/3719802/136967

has a very good explanation. Basically, comparisons are done using Equals() but GetHashCode() is used by the Linq code when doing the processing and if not implemented correctly it will give strange answers.

hth,
Alan.

Upvotes: 0

Related Questions