Reputation: 8463
I am wondering if we implement our own hashmap that doesn't use power-of-two length hash tables (initial capacity and whenever we re-size), then in that case can we just use the object's hashcode and mod the total size directly instead of use a hash function to hash the object's hashcode ?
for example
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
// int hash = hash(key.hashCode()); original way
//can we just use the key's hashcode if our table length is not power-of-two ?
int hash = key.hashCode();
int i = indexFor(hash, table.length);
...
...
}
Upvotes: 3
Views: 2712
Reputation: 29636
Presuming we're talking about OpenJDK 7, the additional hash
is used to stimulate avalanching; it is a mixing function. It is used because the mapping function from a hash to a bucket, since were using a power of 2 for the capacity, is a mere bitwise &
(since a % b
is equivalent to a & (b - 1)
iff b
is a power of 2); this means that the lower bits are the only important ones, so by applying this mixing step it can help protect against poorer hashes.
static int hash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
If you want to use sizes that aren't powers of 2, the above may not be needed.
Actually changing the mapping from hashes to buckets (which normally relies on the capacity being a power of 2) will require you to you to look at indexFor
:
static int indexFor(int h, int length) {
return h & (length-1);
}
You could use (h & 0x7fffffff) % length
here.
Upvotes: 3
Reputation: 533432
This is true, you can pick pseudo-prime numbers instead.
Note: indexFor needs to use %
compensating for the sign instead of a simple &
which can actually make the lookup slower.
indexFor = (h & Integer.MAX_VALUE) % length
// or
indexFor = Math.abs(h % length)
Upvotes: 1
Reputation: 607
You can think of the mod function as a simple form of hash function. It maps a large range of data to a smaller space. Assuming the original hashcode is well designed, I see no reason why a mod cannot be used to transform the hashcode into the size of the table you are using.
If your original hashfunction is not well implemented, e.g. always returns an even number, you will create quite a lot of collisions using just a mod function as your hashfunction.
Upvotes: 1