Reputation: 17648
Hi guys : I'm trying to scan through some objects to see if there are any repeats. To do this, Im using the hashCode field. The objects are serialized in a binary file.
It looks like this :
hashCode=26594 hashCode=26595 hashCode=26596 ...
I would never expect that hashCodes from a collection would exhibit such a pattern unless the JVM or thrift creates hashCodes on the fly for some objects, in certain instances (or maybe, each object created internally has a hashCode that is set to a statically incremented value).
Of course, this question has no definite answer at this point - but, in general, is there a reason or a common case where a stream of objects would have incrementally increasing hashCodes ? Maybe if there is a scenario where somebody has seen such a phenomenon in the past, it might help me shed light on the binary file which I'm trying to understand.
Upvotes: 0
Views: 432
Reputation: 4014
Could they be a sequence of numbers?
Looking at the code for Integer and Long, their hash codes are essentially that number and consecutive numbers will pretty much have consecutive hash codes.
Note that Long will only be consecutive up to Integer.MAX_VALUE, after that it is not as consecutive, though still well patterned.
Upvotes: 1
Reputation: 116888
is there a reason or a common case where a stream of objects would have incrementally increasing hashCodes ? Maybe if there is a scenario where somebody has seen such a phenomenon in the past, it might help me shed light on the binary file which I'm trying to understand.
The short answer is that it is interesting but certainly not wrong. The object's class in question is generating the hashCode()
-- it is nothing to do with the serialization unless for some reason the hashcode value has been calculated during object construction which would be much more bizarre.
You have to remember that the hashcode is typically used with a mod function to place a value in a hash-bucket. As long as the value returned by the hashCode()
method obeys the specs, it is fine:
It may be that it is using some sort of database ID that was generated and are monotonically increasing on purpose. Or this is some sort of Hadoop pattern to track unique results or something.
Upvotes: 1
Reputation: 28168
If you need to check for duplicates you should use the equals
method instead of hashCode
. If you read the javadoc for Object.hashCode
, it says:
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results.
This means that you can have two Objects o1 and o2 with the same hashCode value but where o1.equals(o2) = false
. You'll be detecting a false duplicate.
To check for duplicates you can use a Set, and check for each added object if Set.add(object) == true
. If it returns false, it means that it was already in the set.
The incremental hash in your description seems to me a very bad hash function, unless all the objects are the same class and are there's also an incremental relation between them. For instance, run this code:
List l1 = Arrays.asList(1,2,3,4,5,6,7,8,9);
for (Object object : l1) {
System.out.println("hashCode: " + object.hashCode());
}
You are not saying if the objects are your own defined classes not. If they were yours, always remember that if you override equals
, you should always override hashCode
as well. If not, you are violating the hashCode
contract and some classes (like the hashed collections) may not behave as you'd expect.
Upvotes: 1