Reputation: 3965
I'm attempting to hash the values
10, 100, 32, 45, 58, 126, 3, 29, 200, 400, 0
I need a function that will map them to an array that has a size of 13 without causing any collisions.
I've spent several hours thinking this over and googling and can't figure this out. I haven't come close to a viable solution.
How would I go about finding a hash function of this sort? I've played with gperf, but I don't really understand it and I couldn't get the results I was looking for.
Upvotes: 21
Views: 14963
Reputation: 1053
Try the following which maps your n values to unique indices between 0 and 12 (1369%(n+1))%13
Upvotes: 0
Reputation: 43456
On some platforms (e.g. embedded), modulo operation is expensive, so % 13
is better avoided. But AND
operation of low-order bits is cheap, and equivalent to modulo of a power-of-2.
I tried writing a simple program (in Python) to search for a perfect hash of your 11 data points, using simple forms such as ((x << a) ^ (x << b)) & 0xF
(where & 0xF
is equivalent to % 16
, giving a result in the range 0..15, for example). I was able to find the following collision-free hash which gives an index in the range 0..15 (expressed as a C macro):
#define HASH(x) ((((x) << 2) ^ ((x) >> 2)) & 0xF)
Here is the Python program I used:
data = [ 10, 100, 32, 45, 58, 126, 3, 29, 200, 400, 0 ]
def shift_right(value, shift_value):
"""Shift right that allows for negative values, which shift left
(Python shift operator doesn't allow negative shift values)"""
if shift_value == None:
return 0
if shift_value < 0:
return value << (-shift_value)
else:
return value >> shift_value
def find_hash():
def hashf(val, i, j = None, k = None):
return (shift_right(val, i) ^ shift_right(val, j) ^ shift_right(val, k)) & 0xF
for i in xrange(-7, 8):
for j in xrange(i, 8):
#for k in xrange(j, 8):
#j = None
k = None
outputs = set()
for val in data:
hash_val = hashf(val, i, j, k)
if hash_val >= 13:
pass
#break
if hash_val in outputs:
break
else:
outputs.add(hash_val)
else:
print i, j, k, outputs
if __name__ == '__main__':
find_hash()
Upvotes: 5
Reputation: 146221
I tried a few things and found one semi-manually:
(n ^ 28) % 13
The semi-manual part was the following ruby script that I used to test candidate functions with a range of parameters:
t = [10, 100, 32, 45, 58, 126, 3, 29, 200, 400, 0]
(1..200).each do |i|
t2 = t.map { |e| (e ^ i) % 13 }
puts i if t2.uniq.length == t.length
end
Upvotes: 13
Reputation: 3274
Just some quasi-analytical ramblings:
In your set of numbers, eleven in all, three are odd and eight are even. Looking at the simplest forms of hashing - %13 - will give you the following hash values: 10 - 3, 100 - 9, 32 - 6, 45 - 6, 58 - 6, 126 - 9, 3 - 3, 29 - 3, 200 - 5, 400 - 10, 0 - 0
Which, of course, is unusable due to the number of collisions. Something more elaborate is needed.
Why state the obvious? Considering that the numbers are so few any elaborate - or rather, "less simple" - algorithm will likely be slower than either the switch statement or (which I prefer) simply searching through an unsigned short/long vector of size eleven positions and using the index of the match.
Why use a vector search?
Upvotes: 3
Reputation: 3311
Bob Jenkins has a program for this too: http://burtleburtle.net/bob/hash/perfect.html
Unless you're very lucky, there's no "nice" perfect hash function for a given dataset. Perfect hashing algorithms usually use a simple hashing function on the keys (using enough bits so it's collision-free) then use a table to finish it off.
Upvotes: 3
Reputation: 1635
I did a quick check and using the SHA256 hash function and then doing modular division by 13 worked when I tried it in Mathematica. For c++ this function should be in the openssl library. See this post.
If you were doing a lot of hashing and lookup though, modular division is a pretty expensive operation to do repeatedly. There is another way of mapping an n-bit hash function into a i-bit indices. See this post by Michael Mitzenmacher about how to do it with a bit shift operation in C. Hope that helps.
Upvotes: 0
Reputation: 28099
if you know the exact keys then it is trivial to produce a perfect hash function -
int hash (int n) {
switch (n) {
case 10: return 0;
case 100: return 1;
case 32: return 2;
// ...
default: return -1;
}
}
Upvotes: 24