Jenny Calisay
Jenny Calisay

Reputation: 463

Hash function for strings not working on some strings?

Basically my program reads a text file with the following format:

3
chairs
tables
refrigerators

The number on the first line indicates the number of items in the file to read.

Here's my hash function:

int hash(string& item, int n) {
    int hashVal = 0;
    int len = item.length();

    for(int i = 0; i < len; i++)
      hashVal = hashVal*37 + item[i];

    hashVal %= n;   

    if(hashVal < 0) hashVal += n;

    return hashVal;
}

when my program read the text file above, it was successful. But when I tried another one:

5
sabel
ziyarah
moustache
math
pedobear

The program would freeze. Not a segmentation fault or anything but it would just stop.

Any ideas?

Edit:

int n, tableSize;
myFile >> n;

tableSize = generateTableSize(n); 

string item, hashTable[tableSize];

for(int i = 0; i < tableSize; i++)
    hashTable[i] = "--";

while(myFile >> item && n!=0) {
    int index = hash(item,tableSize);

    if(hashTable[index] == "--")
        hashTable[index] = item;

    else {
        int newIndex = rehash(item,tableSize);
        while(hashTable[newIndex] != "--") {
            newIndex = rehash(item,tableSize);
        }
        hashTable[newIndex] = item;
    }
    n--;
}

int rehash(string item, int n)  {
    return hash(item,n+1);
}

Upvotes: 1

Views: 198

Answers (1)

Tommy Andersen
Tommy Andersen

Reputation: 7220

The code freezes because it ends in an endless loop:

int index = hash(item,tableSize);

if(hashTable[index] == "--")
    hashTable[index] = item;
else {
    int newIndex = rehash(item,tableSize);
    while(hashTable[newIndex] != "--") {
        newIndex = rehash(item,tableSize);
    }
    hashTable[newIndex] = item;
}

You continuously recalculate the index, but do not change the input parameters, so the output stays the same, and therefore it is being recalculated again.

In the code above newIndex is calculated, based on the same inputs as index was calculated from using a different calculaton function though, so most likely it will have a different value than the first time, however the new index is also occupied. So we recalculate the newIndex again this time using the same function as before, with the exact same input, which gives the exact same output again. You look up the same index in the hash table, which is still the same value as the last time you did so, so you recalculate again, once again with the same input parameters, giving the same output, which you look up in the hashtable once again, etc.

The reason why you didn't see this with the first 3 lines, is that you did not have a collision (or at least only a single collisison, meaning the newIndex calculated from the rehash function was useful the first time).

The solution is not to increment the table size (since incrementing the table size, will at best lower the chance of collision which in it self can be good, but won't solve your problem entirely), but to either alter the inputs to your functions, so you get a different output, or change the hashtable structure.

I always found Sedgewick's book on algorithms in C++ useful, there is a chapter on hashing it.

Sadly I don't have my copy of Algorithms in C++ at hand, so I cannot tell you how Sedgewick solved it, but I would suggest for the simple educational purpose of solving your problem, starting by simply incrementing the index by 1 until you find a free slot in the hash table.

Upvotes: 4

Related Questions