Reputation:
i have a key with type AcccAA where A-[A...Z] (capital letters), and c is [1..9]. i have 1500 segments. Now my temp hash function
int HashFunc(string key){
int Adress = ((key[0] + key[1] + key[2] + key[3] + key[4] + key[5]) - 339) * 14;
return Adress;
}
and Excel show a lot of collision in center (from 400 to 900)
Please tell me the hash function to be more evenly.
Upvotes: 3
Views: 1960
Reputation: 45665
One way is to construct a guaranteed collision-free number (which will not make your hash table collision free of course), as long as the possible keys fit in an integral type (e.g. int
):
int number = (key[0] - 'A') + 26 * (
(key[1] - '0') + 10 * (
(key[2] - '0') + 10 * (
(key[3] - '0') + 10 * (
(key[4] - 'A') + 26 * (
(key[5] - 'A')
)))));
This works since 26 * 10 * 10 * 10 * 26 * 26 = 17576000
which fits into an int
fine.
Finally simply hash this integer.
Upvotes: 1
Reputation: 372674
A common way to build a hash function in this case is to evaluate some polynomial with prime coefficients, like this one:
int address = key[0] +
31 * key[1] +
137 * key[2] +
1571 * key[3] +
11047 * key[4] +
77813 * key[5];
return address % kNumBuckets;
This gives a much larger dispersion over the key space. Right now, you get a lot of collisions because anagrams like AB000A
and BA000A
will collide, but with the above hash function the hash is much more sensitive to small changes in the input.
For a more complex but (probably) way better hash function, consider using a string hash function like the shift-add-XOR hash, which also gets good dispersion but is less intuitive.
Hope this helps!
Upvotes: 3