Reputation: 1526
I am studying about hash string functions for a hasmap in javascript. And looking into this code that I found in the web, I am not sure if the function is correct or not:
HashMap._hashString = function(string) {
var hash = 5381;
for (var i=0; i<string.length; i++) {
hash = (hash << 5) + hash + string.charCodeAt(i);
hash = hash & hash;
}
//Reduce the chance of collisions by incorporating the string length,
//and randomize the hashes to prevent malicious collisions.
return hash ^ string.length ^ this._secret;
};
Does it make any sense to have this line?
hash = hash & hash;
In this line of code:
return hash ^ string.length ^ this._secret;
I understand that adding the length of the string as a factor for the hash to evaluate would help to work with the collisions, but why would I add this factor with a XOR operation? Why not using any other bit operator?
I am also reading about this article, to understand a bit more about the hash algorithms:
http://www.cse.yorku.ca/~oz/hash.html
Upvotes: 0
Views: 1286
Reputation: 350831
Does it make any sense to have this line?
hash = hash & hash;
The purpose of that line is to limit the value to the 32-bit range. hash & hash
looks like a no-op, but applying bitwise operators will clip any overflow. It gives the same result as this:
hash = hash & 0xFFFFFFFF
In this line of code:
return hash ^ string.length ^ this._secret;
I understand that adding the length of the string as a factor for the hash to evaluate would help to work with the collisions, but why would I add this factor with a XOR operation? Why not using any other bit operator?
With &
or |
you would lose information: different inputs of the same length would have a bit higher chance of collision. In particular, an &
with a length that is a power of 2, would be disastrous, as it could only yield 2 different values (the length itself or zero). Or an |
with a length that has mostly 1 bits (like 0xffff): this would again limit the possible outcomes.
Doing a +
would be a viable alternative, but then you would want to make sure the result remains in the 32-bit range again.
Upvotes: 1