jarosik
jarosik

Reputation: 4564

LinkedList search performance in HashMap

I'm almost sure I have seen such question somewhere but I can't find it. How come HashMap get performance is O(1) if in order to find a key by equals iteration over LinkedList is performed and this process is O(n).

EDIT:

What I figured out is get() performance is actually between O(1) when there are no collisions and O(n) when every key is in a collision. Is this right?

Upvotes: 2

Views: 1051

Answers (3)

Stephen C
Stephen C

Reputation: 718768

@Eran's answer explains why a HashMap gives O(1) "on average". The key point that he didn't spell out is that HashMap will automatically resize the hash array and redistribute the hash chains when the ratio of array size to number of entries exceeds a (configurable) load factor.

If the hash function and the keys are well behaved, then hash chains are short, and the average time to search chains is O(1).

There are three scenarios where that analysis breaks down:

  1. If the hash function is poor, you may find that many / most of the keys have the same hash value, and end up on the same hash chain. Than can lead to O(N) search times in the worst case.

  2. The application may have to deal with a set of keys that fortuitously all have the same hashcode. You can also get that if someone deliberately selects keys which hash to the same hashcode in order to make some hash chains long. (Think ... denial of service attack.)

  3. The HashMap object's hash array is limited by the maximum size of a Java array. When the array reaches the maximum size, resizing is no longer possible. Hence, for really large maps (billions of entries), lookup times switch from being O(1) to O(N) (with a very small C).

Each one of these scenarios can potentially be a problem, so in Java 8 they made some significant changes to the way that HashMap is implemented. In the Java 8 version, if a hash chain gets long enough, HashMap will switch from using a linked list for the chain to using a balanced binary tree. This changes the worst case behavior from O(N) to O(logN) lookups. The caveat is that this only works when the keys in a chain all implement Comparable.

Upvotes: 2

Francisco Hernandez
Francisco Hernandez

Reputation: 2468

The hash based objects will determine in which bucket they will store the key-value pair based on hash value. Inside each bucket there is a structure (In HashMap case a LinkedList) in where the pair is stored.

If the hash value is usually the same, the bucket will be usually the same so the performance will degrade a lot, let´s see an example:

Consider this class

package hashTest;

import java.util.Hashtable;

public class HashTest {

    public static void main (String[] args) {

        Hashtable<MyKey, String> hm = new Hashtable<>();

        long ini = System.currentTimeMillis();

        for (int i=0; i<100000; i++) {
            MyKey a = new HashTest().new MyKey(String.valueOf(i));

            hm.put(a, String.valueOf(i));
        }

        System.out.println(hm.size());

        long fin = System.currentTimeMillis();
        System.out.println("tiempo: " + (fin-ini) + " mls");
    }

    private class MyKey {

        private String str;

        public MyKey(String i) {
            str = i;
        }

        public String getStr() {
            return str;
        }

        @Override
        public int hashCode() {
            return 0;
        }

        @Override
        public boolean equals(Object o) {
            if (o instanceof MyKey) {
                MyKey aux = (MyKey) o;
                if (this.str.equals(aux.getStr())) {
                    return true;
                }
            }
            return false;
        }
    }
}

Note that hashCode in class MyKey returns always '0' as hash. It is ok with the hashcode definition (http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode()). If we run that program, this is the result

100000 
tiempo: 62866 mls

Is a very poor performance, now we are going to change the MyKey hashcode code:

package hashTest;

import java.util.Hashtable;

public class HashTest {

    public static void main (String[] args) {

        Hashtable<MyKey, String> hm = new Hashtable<>();

        long ini = System.currentTimeMillis();

        for (int i=0; i<100000; i++) {
            MyKey a = new HashTest().new MyKey(String.valueOf(i));

            hm.put(a, String.valueOf(i));
        }

        System.out.println(hm.size());

        long fin = System.currentTimeMillis();
        System.out.println("tiempo: " + (fin-ini) + " mls");
    }

    private class MyKey {

        private String str;

        public MyKey(String i) {
            str = i;
        }

        public String getStr() {
            return str;
        }

        @Override
        public int hashCode() {
            return str.hashCode() * 31;
        }

        @Override
        public boolean equals(Object o) {
            if (o instanceof MyKey) {
                MyKey aux = (MyKey) o;
                if (this.str.equals(aux.getStr())) {
                    return true;
                }
            }
            return false;
        }
    }
}

Note that only hashcode in MyKey has changed, now when we run the code te result is

100000
tiempo: 47 mls

There is an incredible better performance now with a minor change. Is a very common practice return the hashcode multiplied by a prime number (in this case 31), using the same hashcode members that you use inside equals method in order to determine if two objects are the same (in this case only str).

The key for an optimal performance is to choose the best hashcode and equals implementation possible in the class that acts as Key in HashMap.

Upvotes: 2

Eran
Eran

Reputation: 393781

The expected performance of get is O(1). The expected performance is computed under the assumption that the hashCode method distributes the keys uniformly among the buckets of the HashMap, so the average size of each linked list (i.e. the average number of entries in each bucket) is very small, so we can assume each such list can be traversed in a time bound by a small constant.

At the worst case scenario, if a bad hashCode maps all the keys to the same bucket, get would take O(n).

Upvotes: 2

Related Questions