Jim
Jim

Reputation: 19562

Hash map keys are not random

It seems that Java's implementation of the HashMap always places the keys to the same bins (at least I saw that with Integer keys). I.e. the hashing is deterministic and in all runs it produces the same value.
I have heard that some languages randomize the insertions so that in which bucket a key will be stored is unpredictable for security reasons.
Why is Java the keys are always the same?

Upvotes: 3

Views: 1015

Answers (4)

leventov
leventov

Reputation: 15273

To complete other answers, I would mention that when HashMap's keys use built-in Object.hashCode, i. e. don't override this method, that is a pretty often case, Object's hashCode is computed using random number generator, that makes whole system behaivour less deterministic. See this question: Java Object.hashCode() - address or random()? for more details.

Upvotes: 0

Brett Kail
Brett Kail

Reputation: 33936

This is not true for Java 7, which added a unique hash seed to each HashMap instance. There's more information on the Collections Framework Enhancements in Java SE 7 page.

This mechanism was removed in Java 8 for performance, and it was replaced by an alternative that converts comparable keys (such as String) into a balanced tree to elide the DoS security problem. There's more information on the Collections Framework Enhancements in Java SE 8 page.

Upvotes: 5

Tom Hawtin - tackline
Tom Hawtin - tackline

Reputation: 147154

The attack of interest here is Denial of Service (DoS). An adversary chooses a set of keys that hit the same bucket. This transforms the performance of map operations from O(1) to O(n). Do this n times (to construct the map say), and we go from O(n) to O(n^2). There's also the possibility of timing attacks, but I'll conveniently ignore that.

In general, most library code will assume no action is necessary to avoid DoS. However, recently some Java implementations have used MURMUR hashing to randomise the hash function for String to avoid certain attacks. MURMUR mixes a per-process random number into the generation of the hash code such that the function is stable for the process, but is difficult (though not necessarily impossible) to figure out from the outside. More recently this has been replaced with falling back to a tree structure if there are excessive collisions and the key implements Comparable appropriately.

If you're worried about such attacks in the situation you find your code, you can use other Map implementations such as java.util.TreeMap.

Upvotes: 5

StriplingWarrior
StriplingWarrior

Reputation: 156524

In Java, the individual class is responsible for implementing a default hashCode() method (or inheriting one from the Object class), and complicated security scenarios like defeating advanced DoS attacks are not the responsibility of classes like Object, Integer, etc.

So most classes use a very simple, fast implementation that tries to ensure fairly even distribution in common cases.

If you have a case where you feel it's important to implement a custom hashing strategy, whether it's because you want to avoid hacking, or because you know your particular usage is likely to cause a lot of collisions with the default method, you can use a collection like Gnu Trove's THashMap, which allows you to provide a custom hashing strategy specific to the collection instance.

Upvotes: 1

Related Questions