user2872568
user2872568

Reputation:

Best string hash function for this example

i have a key with type AcccAA where A-[A...Z] (capital letters), and c is [1..9]. i have 1500 segments. Now my temp hash function

int HashFunc(string key){   
    int Adress = ((key[0] +  key[1] + key[2] + key[3] + key[4] + key[5]) - 339) * 14;
    return  Adress;
}

and Excel show a lot of collision in center (from 400 to 900)

Please tell me the hash function to be more evenly.

Upvotes: 3

Views: 1960

Answers (2)

leemes
leemes

Reputation: 45665

One way is to construct a guaranteed collision-free number (which will not make your hash table collision free of course), as long as the possible keys fit in an integral type (e.g. int):

int number = (key[0] - 'A') + 26 * (
              (key[1] - '0') + 10 * (
               (key[2] - '0') + 10 * (
                (key[3] - '0') + 10 * (
                 (key[4] - 'A') + 26 * (
                  (key[5] - 'A')
             )))));

This works since 26 * 10 * 10 * 10 * 26 * 26 = 17576000 which fits into an int fine.

Finally simply hash this integer.

Upvotes: 1

templatetypedef
templatetypedef

Reputation: 372674

A common way to build a hash function in this case is to evaluate some polynomial with prime coefficients, like this one:

int address = key[0] + 
              31 * key[1] + 
              137 * key[2] + 
              1571 * key[3] + 
              11047 * key[4] + 
              77813 * key[5];
return address % kNumBuckets;

This gives a much larger dispersion over the key space. Right now, you get a lot of collisions because anagrams like AB000A and BA000A will collide, but with the above hash function the hash is much more sensitive to small changes in the input.

For a more complex but (probably) way better hash function, consider using a string hash function like the shift-add-XOR hash, which also gets good dispersion but is less intuitive.

Hope this helps!

Upvotes: 3

Related Questions