Reputation: 712
Internally, Hashmaps
use a hashfunction
to find the bin
to which a queried key
belongs. Each of these bins
is itself a LinkedList
.
I don't understand how access time can be constant if these LinkedLists
could get very long, and LinkedLists
don't have constant access time, but linear access time instead.
How does the Java Collections
Library manage to guarantee constant access time even if bins got too large for some reason? What is going on internally? What does Java do internally to minimize the negative effects of this?
Upvotes: 2
Views: 1233
Reputation: 15398
The documentation tells you, what's going on if the load factor get's too high:
An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.
Additionally, you can look at the source code, which features quite a list of implementation notes. Most importantly:
This map usually acts as a binned (bucketed) hash table, but when bins get too large, they are transformed into bins of TreeNodes, each structured similarly to those in java.util.TreeMap.
and further on:
Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins.
So in short:
O(ln(t))
search with t
the size of the bin. So large bins have a binary tree dangling from them.Upvotes: 2
Reputation: 726809
I don't understand how access time can be constant if these linked lists could get very long
HashMap
does not offer guaranteed constant access time. It offers amortized constant time, which is a different thing: overall access for n
items would be O(1) on the average, yet each individual access may be O(n).
Moreover, amortized constant time is achieved only when the hash function is "good". When the hash function is bad (for example, returning a constant, which is a valid, but extremely bad, hash function) the library is helpless: the access time is going to be linear, no matter what the implementation is trying to do.
Linked list will grow when multiple hash codes are the same, modulo the number of buckets. However, since HashMap
chooses prime numbers for its bucket count, the most common case for a linked list to become very long is that many hash codes are actually the same, without taking the modulo into consideration. Therefore, simply increasing the number of buckets to a larger prime is not going to reduce the length of the list: it would either move the list to a different bucket, or leave it at its old location, but the length of the list will not be reduced.
Upvotes: 2
Reputation: 3620
If the hash table gets too full, then it needs to be rehashed. To rehash the table, another table with more buckets is created, and all the elements are inserted into the new table. The original table is discarded.
The load factor determines when it is rehashed. The default is 0.75, so when the table is more than 75% full, it is automatically rehashed with twice as many buckets.
To find a place in the table, the hash code is computed and reduced modulo the number of buckets. The idea is the hash function should somewhat randomly distribute the objects, so the number of collisions should be low, and so there shouldn't be too many comparisons.
Upvotes: 1
Reputation: 393916
The average number of elements in each bin is bound by a small constant. This is maintained by keeping the number of bins at least as high as the total number of entries multiplied by a load factor (whose default value is 0.75).
The number of bins is increases with the number of entries in order to keep this invariant.
Here's the relevant code (Java 7) :
void addEntry(int hash, K key, V value, int bucketIndex) {
Entry<K,V> e = table[bucketIndex];
table[bucketIndex] = new Entry<>(hash, key, value, e);
if (size++ >= threshold)
resize(2 * table.length);
}
Where size
is the number of entries, table.length
is the number of bins and threshold
is table.length * loadFactor
.
If you use the default load factor of 0.75 (or any load factor < 1), the number of bins will always be higher than the number of entries, so unless you have a really bad hashCode for your key class, each bin won't have more than one entry on average.
Upvotes: 5