CodeFarmer
CodeFarmer

Reputation: 2708

Weird HashSet contains() behaviour

HashSet in java confused me a lot, when using contains() will it look up hashcode() and equals() result ? But in this case, it doesn't behave normal. Sometime it's problematic if you put this kind of code in large project. Problem is why last statement print FALSE ?what contains() do under the hood ?

class R
{
    int count;
    public R(int count)
    {
        this.count = count;
    }
    public String toString()
    {
        return "R(count attribution:" + count + ")";
    }
    public boolean equals(Object obj)
    {
        if (obj instanceof R)
        {
            R r = (R)obj;
            if (r.count == this.count)
            {
                return true;
            }
        }
        return false;
    }
    public int hashCode()
    {
        return this.count;
    }
}
public class TestHashSet2
{
    public static void main(String[] args) 
    {
        HashSet hs = new HashSet();
        hs.add(new R(5));
        hs.add(new R(-3));
        hs.add(new R(9));
        hs.add(new R(-2));

        System.out.println(hs);

        //change first element
        Iterator it = hs.iterator();
        R first = (R)it.next();
        first.count = -3;
        System.out.println(hs);
        //remove
        hs.remove(new R(-3));
        System.out.println(hs);

        R r1 = new R(-3);
        System.out.println(r1.hashCode());
        Iterator i = hs.iterator();
        R r2 = (R)i.next();
        System.out.println(r2.hashCode());   //same hashcode -3
        System.out.println(r1.equals(r2));   //equals true

        System.out.println("hs contains object which count=-3 ?" + hs.contains(new R(-3)));  //false
    }
}

Upvotes: 4

Views: 1997

Answers (4)

Stephen C
Stephen C

Reputation: 718886

The problem is that the hashcode of an R object can change. This is a violation of the contract that the hashCode() method is supposed to obey.


To understand why this matters, you need to understand how a hash table works. A Java HashSet has at its core an array of lists of entries. When you put an object into the hash table, it first calculates the hashcode of the object. Then it reduces the hash code to an index in the array by calculating

index = hashcode % array.length

Then it searches the chain starting at array[index], and if the object is not present in the list, it adds it.

And to test if the HashSet contains an object, it does the same calculations and searches the same hash chain.

However, if you do something to the object to cause its hashcode to change while it is in the table, then the algorithm above will (typically) look for the object in a different chain to the chain it was originally added to. And of course it won't find it.

The net result is that the HashSet will behave anomalously if the hashcode contract is broken for any object while the object is a member of the set.


Here's what the Java 7 javadoc says (see java.jang.Object#hashcode() ):

"The general contract of hashCode is:

  • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.

  • ...

The "provided no information ..." caveat puzzles me. I think it only works if there is also a rule about not causing object hashcodes to change while they are in a hash table. Unfortunately, this rule is not stated in any of the places you'd expect to find it. Documentation bug?


Perhaps we should call the requirement about not changing hashcodes a "verbal contract"? :-)

Upvotes: 1

Andreas Dolk
Andreas Dolk

Reputation: 114787

The HashSet stores the values in buckets, the bucket index is calculated when you add the element to the hash set. The idea behind it: now the set can read an objects hashcode and calculate the bucket in a single step. In other words: contains() is a O(1) operation.

Imagine a trivial hash set:

bucket    object(hashcode)
#1        5
#2        -3
#3        6

with a hash function to calculate buckets like:

f(hashcode) :=  |  5 -> 1
                | -3 -> 2
                |  6 -> 3

Now have look at what you done in your example: you've removed the object in bucket 2 (changes the function) and changed the hashcode of the object in bucket 1.

The new function looks like:

f(hashcode) :=  |  5 -> 1
                |  6 -> 3

f(-3) will return null (contains() returns false) and your actual object with hashcode -3 is stored where an object with hashcode 5 should be.

Upvotes: 2

Hari Menon
Hari Menon

Reputation: 35405

This is precisely why you should not be using mutable objects as keys in HashSets and HashMaps.

The first iterator returned the R object with hashCode 5. You then changed a property of that object (count). But that does not force the hash to be recalculated. So, as far as the HashSet is concerned, the object for which you changed the count to -3 still lies in the bucket corresponding to hash code 5. Then you removed the object which lies in the bucket corresponding to hash code -3, which was the original R(-3) object. So after that operation, there is no object in the bucket for hash code -3, and so contains(new R(-3)) returns false.

Upvotes: 2

Michael Borgwardt
Michael Borgwardt

Reputation: 346317

By changing an the value of an object after inserting it into the HashSet, you are destroying the integrity of the data structure. After that, you cannot rely on it doing its job.

It's generally a bad idea to use mutable objects as keys for any map or values for a set. Fortunately, the classes used most commonly for this purpose (String, Integer) are immutable.

Upvotes: 6

Related Questions