Reputation: 1814
I'm going through HashMap class in Java. My understanding is that capacity of hash table is 2 to the power of number of buckets(capacity 16 means four buckets). When put(key,value) is called, key.hashCode() outputs an Integer number and this newly added (key,value) pair is placed based on key.hashCode()%number of buckets. But the following is the actual implementation in HashMap.class
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
From the above code, I'm not able to figure out how does the fitting of key.hashCode() value into buckets happen.
Upvotes: 0
Views: 407
Reputation: 115
This may help a bit. If we are going to add ten items with float keys from 1 to 10.
Map<Float, String> map = new HashMap<>();
int numOfBuckets = 64; // HashMap will have 64 bins after inserting 10 items
String format = "|%1$-5s|%2$-35s|%3$-25s|%4$-35s|%5$-25s|%6$-25s|\n";
System.out.format(format, "i", "raw binary", "right-shifted 16 bits", "rehashed", "bucket before rehashed",
"bucket after rehashed");
for (int i = 1; i <= 10; i++) {
float key = i;
int rawInt = Float.floatToRawIntBits(key);
String binaryString = Long.toBinaryString(rawInt);
String shifted16BitsString = Long.toBinaryString(rawInt >>> 16);
int rehashed = rawInt ^ rawInt >>> 16;
String rehashedString = Long.toBinaryString(rehashed);
// HashMap calculates bin index with (n - 1) & hash
String bucketBeforeRehashed = Long.toString((numOfBuckets - 1) & Objects.hashCode(key));
String bucketAfterRehashed = Long.toString((numOfBuckets - 1) & rehashed);
System.out.format(format, i, binaryString, shifted16BitsString, rehashedString,
bucketBeforeRehashed, bucketAfterRehashed);
map.put(key, Integer.toString(i));
}
which produces:
|i |raw binary |right-shifted 16 bits |rehashed |bucket before rehashed |bucket after rehashed |
|1 |111111100000000000000000000000 |11111110000000 |111111100000000011111110000000 |0 |0 |
|2 |1000000000000000000000000000000 |100000000000000 |1000000000000000100000000000000 |0 |0 |
|3 |1000000010000000000000000000000 |100000001000000 |1000000010000000100000001000000 |0 |0 |
|4 |1000000100000000000000000000000 |100000010000000 |1000000100000000100000010000000 |0 |0 |
|5 |1000000101000000000000000000000 |100000010100000 |1000000101000000100000010100000 |0 |32 |
|6 |1000000110000000000000000000000 |100000011000000 |1000000110000000100000011000000 |0 |0 |
|7 |1000000111000000000000000000000 |100000011100000 |1000000111000000100000011100000 |0 |32 |
|8 |1000001000000000000000000000000 |100000100000000 |1000001000000000100000100000000 |0 |0 |
|9 |1000001000100000000000000000000 |100000100010000 |1000001000100000100000100010000 |0 |16 |
|10 |1000001001000000000000000000000 |100000100100000 |1000001001000000100000100100000 |0 |32 |
What we can find from the output is that the lower bits of the keys are all 0s, which leads to all items are assigned to the same bin. But the distribution becomes better after performing right shift and xor. I think that is the case described in the comment of the source code in hash() method of HashMap.
Upvotes: 2
Reputation: 5558
That code does not "fit" the hashCode to buckets, it "just" spreads the hashcode making the upper bits more significant. Here the javadoc of that method.
Computes key.hashCode() and spreads (XORs) higher bits of hash to lower. Because the table uses power-of-two masking, sets of hashes that vary only in bits above the current mask will always collide. (Among known examples are sets of Float keys holding consecutive whole numbers in small tables.) So we apply a transform that spreads the impact of higher bits downward. There is a tradeoff between speed, utility, and quality of bit-spreading. Because many common sets of hashes are already reasonably distributed (so don't benefit from spreading), and because we use trees to handle large sets of collisions in bins, we just XOR some shifted bits in the cheapest possible way to reduce systematic lossage, as well as to incorporate impact of the highest bits that would otherwise never be used in index calculations because of table bounds.
The actual fitting to buckets is done in the getNode(int, Object)
method:
first = tab[(n - 1) & hash]
where hash
is the result of hash(Object)
and n
is the size of the hashtable.
Upvotes: 1