Amit Goel
Amit Goel

Reputation: 157

Hash Code Calculation

I was just going through the concept of hash code and encountered a line multiplying by primes will not tend to shift information away from lower end - as would multiplying by a power of 2

I didn't get this line, can anyone help me with this.

Thanks.

Upvotes: 1

Views: 104

Answers (2)

puhlen
puhlen

Reputation: 8529

In many uses of hashcode, only changes in the least significant part of the hashcode matter. In other words, the difference between 3 and 5 is important, but 3000 and 5000 might as well be the same number.

The reason for this is that hashcode is used to do a rough "sorting" of values into "buckets" based on the value of hashcode. This allows structures like a hashtable to search only within a bucket for a specific value instead of searching through every element in the table.

The thing is, there are over 4 billion possible hashcodes but you will normally have a much smaller number of buckets to put values into.

Imagine a scenario where you are hashing into 10 buckets. Hashcodes 0-9 can all go into separate buckets, but then 10 needs to go into the same bucket as 0, 11 into the same as 1 and so on. If you have hashcodes like 1, 145, 42, 5830 everything is working well because each of those values can be put into a different bucket. With values like 1 ,131, 593021, 63421 on the other hand they all will go to the same bucket because they end in the same number and that's all we are looking at because we only have 10 buckets. So it only changes in the least significant part of our hashcode that really matter to us.

Upvotes: 1

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726987

This advise is given for computing hash code based on multiple fields. It is based on the observation that multiplying by powers of two between 0 and 32 is equivalent to shifting the number left by the corresponding number of bits, thus "zeroing out" the right side of the number.

Consider a situation when you need to construct a hash code of ten fields, and you multiply hash codes of individual fields by 32. This is equivalent to shifting hash codes by five bits to the left. If you do that, the end hash code would not depend on hash codes of the first three fields, because the values of their hash codes would be shifted out of the resultant hash code.

This behavior is not desirable, because items with the last seven fields being the same would have the same hash code, even though the firs three fields could be different. This is bad, because it increases a possibility of hash collisions. In contrast, if you multiply by a prime number above 2, some information about hash values of each field influences the end result, making for a better hash function.

Upvotes: 1

Related Questions