Reputation: 1144

Why HashMap resize In case of collision or worst case

I am asking this question with respect to java version till 1.7 only. I am using reflection to find out current capacity of HashMap. In below program is putting 12 unique person into a single bucket of HashMap (using same hashcode). Then i am putting 13th unique person on same or different bucket(using same or different hashcodes). In both cases after adding this 13th element, HashMap resizes to 32 buckets. I understand that due to load factor .75 and initial capacity 16 HashMap resizes to its double with 13th element. But there are still empty buckets available and only 2 buckets are used for these 13th elements.

My questions are:

Is my understanding correct. Am I not making any mistake. Is this the expected behavior of HashMap?
If all this is correct then even though there are 12 or 11 free buckets why the need to double the HashMap with 13th element in this case. Isn't it extra overhead or costly to resize the HashMap? What is the need to double the HashMap in this case While 13th can be put in any available bucket according to hashcode?

public class HashMapTest {
    public static void main(String[] args)
            throws NoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException {
        HashMap<Person, String> hm = new HashMap<Person, String>();
        for (int i = 1; i <= 12; i++) {
            // 12 Entry in same bucket(linkedlist)
            hm.put(new Person(), "1");
        }
        System.out.println("Number of Buckets in HashMap : " + bucketCount(hm));
        System.out.println("Number of Entry in HashMap :  " + hm.size());
        System.out.println("**********************************");
        // 13th element in different bucket
        hm.put(new Person(2), "2");
        System.out.println("Number of Buckets in HashMap : " + bucketCount(hm));
        System.out.println("Number of Entry in HashMap :  " + hm.size());
    }

    public static int bucketCount(HashMap<Person, String> h)
            throws NoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException {
        Field tableField = HashMap.class.getDeclaredField("table");
        tableField.setAccessible(true);
        Object[] table = (Object[]) tableField.get(h);
        return table == null ? 0 : table.length;
    }
}

class Person {
    int age = 0;

    Person() {
    }

    Person(int a) {
        age = a;
    }

    @Override
    public boolean equals(Object obj) {
        return false;
    }

    @Override
    public int hashCode() {
        if (age != 0) {
            return 1;
        } else {
            return age;
        }
    }
}

OUTPUT

Number of Buckets in HashMap : 16
Number of Entry in HashMap :  12
**********************************
Number of Buckets in HashMap : 32
Number of Entry in HashMap :  13

Upvotes: 4

Answers (3)

Eugene

Reputation: 120858

There is one slight aspect here also; while you resize the internal array (goes from 16 to 32) you are also "touching" all the entries. let me explain:

when there are 16 buckets (internal array is of size 16), only the last 4 bits decide where that entry will go; think %, but internally its actually (n - 1) & hash, where n is the number of buckets.

When the internal array grows, one more bit is taken into consideration to decide where an entry goes: there used to be 4 bits, now there are 5 bits; that means that all the entries are re-hashed and they potentially might move to different buckets now; that is why the resizing happens, to disperse entries.

If you really want to fill all the "gaps", you specify a load_factor of 1; instead of the default of 0.75; but that has implications as documented in the HashMap constructors.

Upvotes: 3

JB Nizet

Reputation: 691765

yes, and this is the expected behavior.
The HashMap doesn't care about how many buckets are used. It only knows that the load factor has been reached, and that the probability of having collisions is thus becoming too big, and the map should thus be resized. Even though many collisions already happened, resizing the map could actually fix that. Not in your case, since you chose identical hashCode on purpose, but in a more realistic case, hashCodes should have a much better distribution. HashMap can't do anything to make itself efficient if you choose bad hashCodes on purpose, and there is no point in adding complexity to handle an extreme case, that should never happen, and that HashMap won't be able to fix anyway.

Upvotes: 5

Eran

Reputation: 393851

Yes, the behavior you observe is the expected behavior.

The implementation of HashMap expects you to use a reasonable hashCode for the keys. It assumes that your hashCode would distribute the keys as evenly as possible among the available buckets. If you fail to do that (as you did in your example - where all the keys have the same hashCode), you will get bad performance.

Under the assumption of even distribution, it makes sense for the HashMap to double its size once you pass the load factor. It doesn't check how many buckets are actually empty (since it has no way of knowing if new entries will be assigned to empty buckets or to occupied buckets). It just checks the average number of entries per bucket. Once that number exceeds the load factor, the number of buckets is doubled.

Upvotes: 3

Why HashMap resize In case of collision or worst case

Answers (3)

Related Questions