hooch
hooch

Reputation: 1175

HashMap yields same hashCode for different contents

I have the following unit test: I create 2 different Objects of my custom type Variable. I compare their hash codes which simply returns the hash code of their names, ie String.hashCode().

Then I create 2 HashSets, each holding one variable and compare the hash codes of the sets.

In both cases the hash codes differ, as expected.

However, if I create a HashMap with the name as index and the Variable as value, the assertion fails, ie they compare the same. Why is that?

Using Oracle Java 1.8.

EDIT: I can add another guarantee: Assert.assertNotEquals(map1, map2); also holds. Furthermore I think I interpret this sentence correctly:

The hash code of a map is defined to be the sum of the hash codes of each entry in the map's entrySet() view. This ensures that m1.equals(m2) implies that m1.hashCode()==m2.hashCode() for any two maps m1 and m2, as required by the general contract of Object.hashCode(). Taken from http://docs.oracle.com/javase/7/docs/api/java/util/AbstractMap.html#hashCode%28%29

@Test
public void test() {
    // this assertion holds
    Assert.assertNotEquals(new Variable("x").hashCode(), new Variable("y").hashCode());

    Set<Variable> set1 = new HashSet<>();
    set1.add(new Variable("x"));
    Set<Variable> set2 = new HashSet<>();
    set2.add(new Variable("y"));
    // this assertion also holds
    Assert.assertNotEquals(set1.hashCode(), set2.hashCode());

    HashMap<String, Variable> map1 = new HashMap<>();
    map1.put("x", new Variable("x"));
    HashMap<String, Variable> map2 = new HashMap<>();
    map2.put("y", new Variable("y"));
    // why does this assertion fail?
    Assert.assertNotEquals(map1.hashCode(), map2.hashCode());
}

Here is the definition of class Variable.

public class Variable {
    private String name;

    public Variable(String name) {
        this.name = name;
    }

    @Override
    public int hashCode() {
        return name.hashCode();
    }

    @Override
    public boolean equals(Object obj) {
        if (obj == null || !(obj instanceof Variable))
            return false;
        return name.equals(((Variable) obj).name);
    }
}

Upvotes: 1

Views: 602

Answers (2)

John Kugelman
John Kugelman

Reputation: 361547

Oracle's implementations of both AbstractMap.Entry and HashMap.Entry define hashCode as:

public int hashCode() {
    return (key   == null ? 0 :   key.hashCode()) ^
           (value == null ? 0 : value.hashCode());
}

Notice the XOR operator. If the key and value both have the same hash code they will cancel out when XOR'ed, and the overall hash code for that entry will be 0.

This happens in your code because the hash code for a Variable is the same as the string you pass to it, and those strings are the same as the keys.

It's worth noting that different objects aren't guaranteed to have different hash codes. The only guarantee with hash codes is that equal objects will have equal hash codes. Unequal objects will usually have different hash codes if the hash function is good, but it's not a guarantee.

As it turns out, this isn't just a theoretical possibility. It's quite possible in real programs!

The logical follow-up question is then: how can I avoid this? The HashMap is sort of a symbol table which I really like to index by name. And atm I don't have any more useful members for class Variable to include in equals() and hashCode(). Any ideas?

You could give Variable a different hash code from the built-in String implementation. An easy way to do that would be to use the stock implementation but change the multiplier from 31 to some other prime number.

For example:

private int hash;

@Override public int hashCode() {
    int h = hash;
    if (h == 0) {
        int len = name.length();
        h = 1;
        for (int i = 0; i < len; i++) {
            h = 47*h + name.charAt(i);
        }
        hash = h;
    }
    return h;
}

This is a modified version of OpenJDK's String.hashCode() implementation. I added h = 1 so even 1-character strings will be different.

Upvotes: 1

Sotirios Delimanolis
Sotirios Delimanolis

Reputation: 279880

This is just a coincidence. The HashMap.Node implementation (which HashMap#hashCode() uses) of hashCode() is

public final int hashCode() {
    return Objects.hashCode(key) ^ Objects.hashCode(value);
}

Where key and value have both the same hashCode. For example, key is "x" and value is a Variable object created with a name of "x" (which it uses for its hashCode). In other words, "x".hashCode() and new Variable("x").hashCode() are equal.

Any value ^ value is equal to 0. So the hashCode of your map comes out as 0 for both maps.

Upvotes: 1

Related Questions