Reputation: 1095
This is a prefix hashing function. i want to count the number of collisions in this method but i am not sure how to do it. It seems like it might be simple but i just cant think of a great way to do it....
int HashTable_qp::preHash(string & key, int tableSize )
{
string pad = "AA";
//some words in the input are less than 3 letters
//I choose to pad the string with A because all padded characters
//have same ascii val, which is low, and will hopefully alter the results less
if (key.length() < 3)
{
key.append(pad);
}
return ( key[0] + 27 * key[1] + 729 * key[2] ) % tableSize;
}
Upvotes: 0
Views: 2206
Reputation: 44250
create a histogram:
unsigned histogram[tablesize] = {0};
generate some (all) possible strings and compute their hashval, and update the histogram accordingly:
for(iter=0; iter < somevalue; iter++) {
hashval = hashfunc( string.iterate(iter) ); // I don't know c++
histogram[hashval] +=1;
}
Now you have to analyze the hashtable for lumps / clusters. Rule of thumb is that for (tablesize==iter)
, you expect about 30 % cells with count =1, and about 30 % empty; the rest has two or more.
If you sum all the (count*(count+1))/2
, and divide by the tablesize, you should expect around 1.5. A bad hashfunction gives higher values, a perfect hash would only have cells with count=1 (and thus: ratio=1) With linear probing you should of course never use tablesize=niter, but make tablesize bigger, say twice as big. You can use the same metric (number of probes / number of entries), to analyse its performance, though.
UPDATE: a great introduction on hashfunctions and their performance can be found at http://www.strchr.com/hash_functions .
Upvotes: 1
Reputation: 36
If it's an array as the underlying data structure do:
int hash = preHash(&key, array.length);
if(array[hash] != null)
this.count++;
If it's an array of linked lists do:
if(array[hash] != null && *(array[hash]) != null)
this.count++
If you only have access to the stl library I believe just testing that element is null before adding it would be enough after calling the hash function.
Upvotes: 1
Reputation: 86260
You can create an array of integers, each representing one hash. When you're done making the hashes loop back through the array in a nested loop. If you had the following array,
[0] -> 13
[1] -> 5
[2] -> 12
[3] -> 7
[4] -> 5
For each item i in 0..n, check items i+1..n for matches. In English that would be: check if each element equals any of the elements after it.
Upvotes: 0