David
David

Reputation: 5214

Writing hashCode methods for heterogeneous keys

I have a Java HashMap whose keys are instances of java.lang.Object, that is: the keys are of different types. The hashCode values of two key objects of different types are likely to be the same when they contain identical variable values.

In order to improve the performance of the get method for my HashMap, I'm inclined to mix the name of the Java type into the hashCode methods of my key objects. I have not seen examples of this elsewhere, and so my this-might-be-wacky alarm went off. Do you think mixing the type into hashCode is a good idea? Should I mix in the class name, or the hashCode of the relevant Class object?

Upvotes: 2

Views: 418

Answers (3)

maaartinus
maaartinus

Reputation: 46492

Years later...

Apart from it being a premature optimization, it's not a bad idea and the overhead is tiny. Choy's recommendation to profile first is surely good in general, but sometimes a simple optimization takes much less time than the profiling. This seems to be such a case.

I'd use a different multiplier as already suggested and mix in getClass().getHashCode().

Or maybe getClass().getName().getHashCode() as it stays consistent across JVM invocations, which might be helpful if you want a reproducible HashMap iteration order for easier debugging. Note that you should never rely on such a reproducibility and that there are quite many things destroying it.

Upvotes: 0

choy
choy

Reputation: 426

I think your this-might-be-wacky alarm should have gone off when you decided to have keys of different types. But let's assume this is a case where Object is really the way to go.

You should try it without mixing in the type name and stress test the performance if you find that this particular lookup is determined to be a hotspot in the system. Chances are the performance doesn't matter that much.

Like Jon implied, the performance of the hash map is improved by reducing collisions. Mixing in the type name is just as likely to increase collisions as it is to reduce them. To keep your hashmap in peak condition, you want the likelihood of any particular hashcode to be about that same as any other over the domain of valid key values. So the probability of a hashcode of 10 should be about the same as the probability of 100 or any other number. That way the hash table buckets fill evenly (in all likelihood). so whether you have an object of type A or type B should not matter. just the probability distribution of the hashcodes of all occurring key values.

Upvotes: 1

Jon Skeet
Jon Skeet

Reputation: 1503280

I wouldn't mix the type name in - but if you're controlling the hashCode algorithm already, why not just change it so that they won't clash? For example, if you're using the common "add and multiply" approach, you could start off with different base cases or use different multipliers.

Before you worry about this too much though, have you actually measured how often you're really getting collisions with real data? Is this definitely a problem, or are you just concerned that it might be a problem?

Upvotes: 3

Related Questions