Reputation: 4564
I'm almost sure I have seen such question somewhere but I can't find it. How come HashMap get performance is O(1) if in order to find a key by equals iteration over LinkedList is performed and this process is O(n).
EDIT:
What I figured out is get() performance is actually between O(1) when there are no collisions and O(n) when every key is in a collision. Is this right?
Upvotes: 2
Views: 1051
Reputation: 718768
@Eran's answer explains why a HashMap
gives O(1)
"on average". The key point that he didn't spell out is that HashMap
will automatically resize the hash array and redistribute the hash chains when the ratio of array size to number of entries exceeds a (configurable) load factor.
If the hash function and the keys are well behaved, then hash chains are short, and the average time to search chains is O(1)
.
There are three scenarios where that analysis breaks down:
If the hash function is poor, you may find that many / most of the keys have the same hash value, and end up on the same hash chain. Than can lead to O(N)
search times in the worst case.
The application may have to deal with a set of keys that fortuitously all have the same hashcode. You can also get that if someone deliberately selects keys which hash to the same hashcode in order to make some hash chains long. (Think ... denial of service attack.)
The HashMap
object's hash array is limited by the maximum size of a Java array. When the array reaches the maximum size, resizing is no longer possible. Hence, for really large maps (billions of entries), lookup times switch from being O(1)
to O(N)
(with a very small C
).
Each one of these scenarios can potentially be a problem, so in Java 8 they made some significant changes to the way that HashMap
is implemented. In the Java 8 version, if a hash chain gets long enough, HashMap
will switch from using a linked list for the chain to using a balanced binary tree. This changes the worst case behavior from O(N)
to O(logN)
lookups. The caveat is that this only works when the keys in a chain all implement Comparable
.
Upvotes: 2
Reputation: 2468
The hash based objects will determine in which bucket they will store the key-value pair based on hash value. Inside each bucket there is a structure (In HashMap case a LinkedList) in where the pair is stored.
If the hash value is usually the same, the bucket will be usually the same so the performance will degrade a lot, let´s see an example:
Consider this class
package hashTest;
import java.util.Hashtable;
public class HashTest {
public static void main (String[] args) {
Hashtable<MyKey, String> hm = new Hashtable<>();
long ini = System.currentTimeMillis();
for (int i=0; i<100000; i++) {
MyKey a = new HashTest().new MyKey(String.valueOf(i));
hm.put(a, String.valueOf(i));
}
System.out.println(hm.size());
long fin = System.currentTimeMillis();
System.out.println("tiempo: " + (fin-ini) + " mls");
}
private class MyKey {
private String str;
public MyKey(String i) {
str = i;
}
public String getStr() {
return str;
}
@Override
public int hashCode() {
return 0;
}
@Override
public boolean equals(Object o) {
if (o instanceof MyKey) {
MyKey aux = (MyKey) o;
if (this.str.equals(aux.getStr())) {
return true;
}
}
return false;
}
}
}
Note that hashCode in class MyKey returns always '0' as hash. It is ok with the hashcode definition (http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode()). If we run that program, this is the result
100000
tiempo: 62866 mls
Is a very poor performance, now we are going to change the MyKey hashcode code:
package hashTest;
import java.util.Hashtable;
public class HashTest {
public static void main (String[] args) {
Hashtable<MyKey, String> hm = new Hashtable<>();
long ini = System.currentTimeMillis();
for (int i=0; i<100000; i++) {
MyKey a = new HashTest().new MyKey(String.valueOf(i));
hm.put(a, String.valueOf(i));
}
System.out.println(hm.size());
long fin = System.currentTimeMillis();
System.out.println("tiempo: " + (fin-ini) + " mls");
}
private class MyKey {
private String str;
public MyKey(String i) {
str = i;
}
public String getStr() {
return str;
}
@Override
public int hashCode() {
return str.hashCode() * 31;
}
@Override
public boolean equals(Object o) {
if (o instanceof MyKey) {
MyKey aux = (MyKey) o;
if (this.str.equals(aux.getStr())) {
return true;
}
}
return false;
}
}
}
Note that only hashcode in MyKey has changed, now when we run the code te result is
100000
tiempo: 47 mls
There is an incredible better performance now with a minor change. Is a very common practice return the hashcode multiplied by a prime number (in this case 31), using the same hashcode members that you use inside equals method in order to determine if two objects are the same (in this case only str).
The key for an optimal performance is to choose the best hashcode
and equals
implementation possible in the class that acts as Key in HashMap.
Upvotes: 2
Reputation: 393781
The expected performance of get
is O(1)
. The expected performance is computed under the assumption that the hashCode
method distributes the keys uniformly among the buckets of the HashMap
, so the average size of each linked list (i.e. the average number of entries in each bucket) is very small, so we can assume each such list can be traversed in a time bound by a small constant.
At the worst case scenario, if a bad hashCode
maps all the keys to the same bucket, get
would take O(n)
.
Upvotes: 2