Reputation: 1144
I am asking this question with respect to java version till 1.7 only. I am using reflection to find out current capacity of HashMap. In below program is putting 12 unique person into a single bucket of HashMap (using same hashcode). Then i am putting 13th unique person on same or different bucket(using same or different hashcodes). In both cases after adding this 13th element, HashMap resizes to 32 buckets. I understand that due to load factor .75 and initial capacity 16 HashMap resizes to its double with 13th element. But there are still empty buckets available and only 2 buckets are used for these 13th elements.
My questions are:
Is my understanding correct. Am I not making any mistake. Is this the expected behavior of HashMap?
If all this is correct then even though there are 12 or 11 free buckets why the need to double the HashMap with 13th element in this case. Isn't it extra overhead or costly to resize the HashMap? What is the need to double the HashMap in this case While 13th can be put in any available bucket according to hashcode?
.
public class HashMapTest {
public static void main(String[] args)
throws NoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException {
HashMap<Person, String> hm = new HashMap<Person, String>();
for (int i = 1; i <= 12; i++) {
// 12 Entry in same bucket(linkedlist)
hm.put(new Person(), "1");
}
System.out.println("Number of Buckets in HashMap : " + bucketCount(hm));
System.out.println("Number of Entry in HashMap : " + hm.size());
System.out.println("**********************************");
// 13th element in different bucket
hm.put(new Person(2), "2");
System.out.println("Number of Buckets in HashMap : " + bucketCount(hm));
System.out.println("Number of Entry in HashMap : " + hm.size());
}
public static int bucketCount(HashMap<Person, String> h)
throws NoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException {
Field tableField = HashMap.class.getDeclaredField("table");
tableField.setAccessible(true);
Object[] table = (Object[]) tableField.get(h);
return table == null ? 0 : table.length;
}
}
class Person {
int age = 0;
Person() {
}
Person(int a) {
age = a;
}
@Override
public boolean equals(Object obj) {
return false;
}
@Override
public int hashCode() {
if (age != 0) {
return 1;
} else {
return age;
}
}
}
OUTPUT
Number of Buckets in HashMap : 16
Number of Entry in HashMap : 12
**********************************
Number of Buckets in HashMap : 32
Number of Entry in HashMap : 13
Upvotes: 4
Views: 1712
Reputation: 120858
There is one slight aspect here also; while you resize the internal array (goes from 16 to 32) you are also "touching" all the entries. let me explain:
when there are 16 buckets (internal array is of size 16), only the last 4 bits
decide where that entry will go; think %
, but internally its actually (n - 1) & hash
, where n
is the number of buckets.
When the internal array grows, one more bit is taken into consideration to decide where an entry goes: there used to be 4 bits
, now there are 5 bits
; that means that all the entries are re-hashed and they potentially might move to different buckets now; that is why the resizing happens, to disperse entries.
If you really want to fill all the "gaps", you specify a load_factor
of 1
; instead of the default of 0.75
; but that has implications as documented in the HashMap constructors.
Upvotes: 3
Reputation: 691765
Upvotes: 5
Reputation: 393851
Yes, the behavior you observe is the expected behavior.
The implementation of HashMap
expects you to use a reasonable hashCode
for the keys. It assumes that your hashCode
would distribute the keys as evenly as possible among the available buckets. If you fail to do that (as you did in your example - where all the keys have the same hashCode
), you will get bad performance.
Under the assumption of even distribution, it makes sense for the HashMap
to double its size once you pass the load factor. It doesn't check how many buckets are actually empty (since it has no way of knowing if new entries will be assigned to empty buckets or to occupied buckets). It just checks the average number of entries per bucket. Once that number exceeds the load factor, the number of buckets is doubled.
Upvotes: 3