Reputation: 17648

Incrementally ascending hash codes in a collection of objects

Hi guys : I'm trying to scan through some objects to see if there are any repeats. To do this, Im using the hashCode field. The objects are serialized in a binary file.

It looks like this :

hashCode=26594 hashCode=26595 hashCode=26596 ...

I would never expect that hashCodes from a collection would exhibit such a pattern unless the JVM or thrift creates hashCodes on the fly for some objects, in certain instances (or maybe, each object created internally has a hashCode that is set to a statically incremented value).

Of course, this question has no definite answer at this point - but, in general, is there a reason or a common case where a stream of objects would have incrementally increasing hashCodes ? Maybe if there is a scenario where somebody has seen such a phenomenon in the past, it might help me shed light on the binary file which I'm trying to understand.

Minor detail : these objects are binary serialized using the Apache thrift library, and they are read/written in java/hadoop, entirely.

Upvotes: 0

Answers (3)

deterb

Reputation: 4014

Could they be a sequence of numbers?

Looking at the code for Integer and Long, their hash codes are essentially that number and consecutive numbers will pretty much have consecutive hash codes.

Note that Long will only be consecutive up to Integer.MAX_VALUE, after that it is not as consecutive, though still well patterned.

Upvotes: 1

Gray

Reputation: 116888

is there a reason or a common case where a stream of objects would have incrementally increasing hashCodes ? Maybe if there is a scenario where somebody has seen such a phenomenon in the past, it might help me shed light on the binary file which I'm trying to understand.

The short answer is that it is interesting but certainly not wrong. The object's class in question is generating the hashCode() -- it is nothing to do with the serialization unless for some reason the hashcode value has been calculated during object construction which would be much more bizarre.

You have to remember that the hashcode is typically used with a mod function to place a value in a hash-bucket. As long as the value returned by the hashCode() method obeys the specs, it is fine:

the hashCode method must consistently return the same integer for the same object value, provided no information used in equals comparisons on the object is modified
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result
the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

It may be that it is using some sort of database ID that was generated and are monotonically increasing on purpose. Or this is some sort of Hadoop pattern to track unique results or something.

Upvotes: 1

Mister Smith

Reputation: 28168

If you need to check for duplicates you should use the equals method instead of hashCode. If you read the javadoc for Object.hashCode, it says:

It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results.

This means that you can have two Objects o1 and o2 with the same hashCode value but where o1.equals(o2) = false. You'll be detecting a false duplicate.

To check for duplicates you can use a Set, and check for each added object if Set.add(object) == true. If it returns false, it means that it was already in the set.

The incremental hash in your description seems to me a very bad hash function, unless all the objects are the same class and are there's also an incremental relation between them. For instance, run this code:

    List l1 = Arrays.asList(1,2,3,4,5,6,7,8,9);
    for (Object object : l1) {
        System.out.println("hashCode: " + object.hashCode());
    }

You are not saying if the objects are your own defined classes not. If they were yours, always remember that if you override equals, you should always override hashCode as well. If not, you are violating the hashCode contract and some classes (like the hashed collections) may not behave as you'd expect.

Upvotes: 1

Incrementally ascending hash codes in a collection of objects

Answers (3)

Related Questions