Reputation: 1721
I have 64-bit positive integers (range from 0 to 263 - 1) and I want to hash them into 32-bit positive integers (0 to 231 - 1 range).
My data has a Gaussian distribution. Can anyone suggest a hash function that will give a low number of collisions for this distribution?
(Original question was here, which I've improved upon.)
Upvotes: 4
Views: 2650
Reputation: 23550
You can first map your input data through the (expected) cummulative distribution function with the result of having output that then is (expected to be) evenly distributed. You can then put that data into a regular 64-to-32-bit-hash function.
Upvotes: 3
Reputation: 533432
Based on the hash for Long which is a 64-bit integer.
int hash = (int) ((l >>> 32) ^ l);
BTW: A gaussian distribution is signed do I don't believe it would appropriate for an unsigned value.
If you have something which follows a guassian distribution which has be scaled and shifted, the lower 32-bit may be still completely random. (Depending on the scale) If the lower 32-bit are random, it doesn't matter what the upper bits are (they can all be 0) and the hash will still be pseudo-random.
BTW: Even if your hash is unique in converting to a 32-bit value, you will have to reduce this further to save memory (Unless you have your own hash table which is 2^32 in size) This means after reducing the value further to something reasonable e.g. double the size of the number of samples, you will have some collisions (unless it turns out your 64-bit value is far, far more bits than you need)
Upvotes: 2