hetaoblog
hetaoblog

Reputation: 2030

Why does Object.hashcode() have conflicts in Java?

I run the code below in Hotspot JDK 1.6 on Windows XP, I ran it twice and I got the results below.

So basically it seems the object.hashcode() also have conflicts? it looks like it's not returning the memory address in the VM.

However, a comment in the JDK said the values should be distinct, can anyone explain?

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)

@return  a hash code value for this object.
@see     java.lang.Object#equals(java.lang.Object)
@see     java.util.Hashtable

This is the first result:

i,hashcode(): 361,9578500
i,hashcode(): 1886,9578500
conflict:1886, 361
i,hashcode(): 1905,14850080
i,hashcode(): 2185,14850080
conflict:2185, 1905
9998

This is the 2nd result:

i,hashcode(): 361,5462872
i,hashcode(): 1886,29705835
conflict:1887, 362
i,hashcode(): 1905,9949222
i,hashcode(): 2185,2081190
conflict:2186, 1906
9998
10000

My code:

@Test
    public void testAddr()
    {
        Set<Integer> s = new TreeSet<Integer>();
        Map<Integer, Integer> m = new TreeMap<Integer, Integer>();
        Set<Object> os = new HashSet<Object>();

        for(int i = 0; i < 10000; ++i)
        {
            Object o = new Object();
            os.add(o);
            Integer h = o.hashCode();

            if((i == 361) || (i == 1886) || (i == 2185) || (i == 1905))
            {
                System.out.println("i,hashcode(): " + i + "," + h);
            }

            if(s.contains(h))
            {
                System.out.println("conflict:" + i + ", " + m.get(h));
            }
            else
            {
                s.add(h);   
                m.put(h,  i);
            }

        }


        System.out.println(s.size());

        int c = 0;
        for(Object o: os)
        {
            c++;
        }

        System.out.println(c);
    }

Upvotes: 2

Views: 1256

Answers (5)

supercat
supercat

Reputation: 81115

It is possible that a long-running program may create, call hashCode() upon, and abandon many billions of objects during the time that it runs. Thus, it would be mathematically impossible to ensure that ensuring that once some object hashCode returned a particular number, no other object would ever return that same number for the life of the program. Even if hashCode() somehow managed to return unique values for the first 4,294,967,296 objects, it would have no choice but to return an already-used value for the next one (since the previous call would have used the last remaining formerly-unused value).

The fact that hashCode() clearly cannot guarantee that hash values won't get reused for the life of the program does not mean it couldn't guarantee that hash codes won't get reused during the lifetime of the objects in question. Indeed, for some memory-management schemes, such a guarantee could be made relatively cheaply. For example, the 1984 Macintosh split the heap into two parts, one of which held fixed-sized object descriptors, and one of which held variable-sized object data. The object descriptors, once created, would never move; if any objects were deleted, the space used by their descriptors would get reused when new objects were created. Under such a scheme, the address of an object descriptor would represent a unique and unchanging representation of its identity for as long as the object existed, and could thus be used as a hashCode() value. Unfortunately, such schemes tend to have more overhead than some other approaches in which objects have no fixed address associated with them.

Upvotes: 1

Adam Mihalcin
Adam Mihalcin

Reputation: 14458

hashCode() is supposed to be used for placing objects in hash tables. The rule for hashCode is not that hashCode should never generate conflicts, although that is a desirable property, but that equal objects must have equal hash codes. This does not preclude non-equal objects from having equal hash codes.

You have found a case where the default Object.hashCode() implementation does generate equal hash codes for non-equal objects. It is required that the hash code of an object not change unless there is a change to some field affection equality of that object with another. One possible cause is that the garbage collector rearranged memory so that a later instantiation of o was at the same location as an earlier instantiation of o (that is, you allocated two objects o in the loop, and the garbage collector rearranged memory in between the two allocations so that the old o was moved out of one location of memory and the new o was then allocated at that location). Then, even though the hash code for the old o cannot change, the hash code for the new o is the address where the new o is stored in memory, which happens to be equal to the hash code for the old o.

Upvotes: 5

Tom Hawtin - tackline
Tom Hawtin - tackline

Reputation: 147124

It's an unfortunately common misinterpretation of the API docs. From a still-unfixed (1 vote) bug for this some time ago.

(spec) System.identityHashCode doc inadequate, Object.hashCode default implementation docs mislead

[...]

From Usenet discussions and Open Source Software it appears that many, perhaps majority, of programmers take this to mean that the default implementation, and hence System.identityHashCode, will produce unique hashcodes.

The suggested implementation technique is not even appropriate to modern handleless JVMs, and should go the same way as JVM Spec Chapter 9.

The qualification "As much as is reasonably practical," is, in practice, insufficient to make clear that hashcodes are not, in practice, distinct.

Upvotes: 2

deraj
deraj

Reputation: 91

Hashcodes do not have to be unique just consistent. Although more often than not they are typically fairly unique.

In addition to your excerpt above Object has the following to say.

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)

Object Doc

Upvotes: 0

SLaks
SLaks

Reputation: 887195

The comment does not say that it is distinct.
It says that it is distinct as much as is reasonably practical.

Apparently, you found a case where it wasn't practical.

Upvotes: 0

Related Questions