Reputation: 31
What is the time and space complexity of the new implement of hashmap in java 8 (with the tree improvement) and is it concurrent?
Is there an O(n) complexity, for example when the tree in the bucket is not balanced?
Where can I find this implement?
Thank you.
Upvotes: 1
Views: 3001
Reputation: 544
Hash collisions have negative impact on the lookup time of HashMap. When multiple keys end up in the same bucket, then values along with their keys are placed in a linked list. In case of retrieval, linked list has to be traversed to get the entry. In worst case scenario, when all keys are mapped to the same bucket, the lookup time of HashMap increases from O(1) to O(n).
Java 8 has come with the following improvements/changes of HashMap objects in case of high collisions.
The alternative String hash function added in Java 7 has been removed. Buckets containing a large number of colliding keys will store their entries in a balanced tree instead of a linked list after certain threshold is reached. Above changes ensure performance of O(log(n)) in worst case scenarios (hash function is not distributing keys properly) and O(1) with proper hashCode().
Upvotes: 1
Reputation: 43150
If you actually look at the source - which is bundled with the JDK and should be automatically opened by the IDE if you look at the class, you will see the comment quoted below.
Note: This is an implementation detail, thus subject to change and does not apply to all Java implementations, e.g. it might be different on android. The comment is from OpenJDK 1.8
/* [...] * This map usually acts as a binned (bucketed) hash table, but * when bins get too large, they are transformed into bins of * TreeNodes, each structured similarly to those in * java.util.TreeMap. Most methods try to use normal bins, but * relay to TreeNode methods when applicable (simply by checking * instanceof a node). Bins of TreeNodes may be traversed and * used like any others, but additionally support faster lookup * when overpopulated. However, since the vast majority of bins in * normal use are not overpopulated, checking for existence of * tree bins may be delayed in the course of table methods. * * Tree bins (i.e., bins whose elements are all TreeNodes) are * ordered primarily by hashCode, but in the case of ties, if two * elements are of the same "class C implements Comparable<C>", * type then their compareTo method is used for ordering. (We * conservatively check generic types via reflection to validate * this -- see method comparableClassFor). The added complexity * of tree bins is worthwhile in providing worst-case O(log n) * operations when keys either have distinct hashes or are * orderable, Thus, performance degrades gracefully under * accidental or malicious usages in which hashCode() methods * return values that are poorly distributed, as well as those in * which many keys share a hashCode, so long as they are also * Comparable. (If neither of these apply, we may waste about a * factor of two in time and space compared to taking no * precautions. But the only known cases stem from poor user * programming practices that are already so slow that this makes * little difference.) [...] */
Note the O(log n) tree bin behavior being conditional on properties of the inserted keys.
Also note that Big-O notations generally omit another factor that depends on the complexity of the compare operation, e.g. if they are string comparisons they also depend on the common prefix length of the strings.
Upvotes: 3