PTN
PTN

Reputation: 1692

Hash function C

I am having trouble implementing my hash function for my hash table.

I want to hash my words such that A = 1, B = 2, C = 3, and so on. The position of the letter in the word is irrelevant, since we will consider permutations of the word. Moreover, the case of the letter will be irrelevant in this problem as well, so the value of a = the value of A = 1.

And for strings, abc = 1 + 2 + 3 = 6, bc = 2 + 3 = 5, etc.

And for cases where ab = 3 and aaa = 3, I have already had a way to handle that situation. Right now I just want to get the hash value.

The problem I am having right now is that aaa is giving me 1, and ab is giving me 2.

Below is my code:

int hash(char *word)
{
   int h = 1;
   int i, j;

   char *A;
   char *a;
   // an array of 26 slots for 26 uppercase letters in the alphabet
   A = (char *)malloc(26 * sizeof(char));
   // an array of 26 slots for 26 lowercase letters in the alphabet
   a = (char *)malloc(26 * sizeof(char));

   for (i = 0; i < 26; i++) {
      A[i] = (char)(i + 65); // fill the array from A to Z
      a[i] = (char)(i + 97); // fill the array from a to z
   }

   for (i = 0; i < strlen(word); i++) {
      //printf("HIT\n");
      for (j = 0; j < 26; j++) {
         // upper and lower case have the same hash value
         if (word[i] == A[j] || word[i] == a[j]) {
            h = h + j; // get the hash value of the word
            //printf("HIT 2\n");
            break;
         }
      }
   }

   printf("H: %d\n", h);

   return h;
}

Upvotes: 0

Views: 2531

Answers (3)

Daniel Jour
Daniel Jour

Reputation: 16156

Multiple issues:

  1. You still aren't freeing the arrays you allocated
  2. Initial value of 1 for h makes no sense
  3. You add the index to the hash. 'A' and 'a' are at index 0, so you're adding 0 in that case (so no matter how many 'a' s you give your code will return 1)
  4. Why a dynamic array? You know the size, it isn't going to change. You could use

    char A[26];
    char a[26]; // you can also add initialisation, e.g. = {'a', 'b', ...
    
  5. Why an array in the first place?

So, here is the quick fix, staying close to your code.

Taking all of the above into account, you could simplify to:

int hash(char const * string) {
  int h = 0;
  for (; *string; ++string) {
    int index = tolower(*string) - 'a' + 1;
    if ((index > 0) && (index < 27)) {
      h += index;
    }
  }
  return h;
}

Live


When only hashing words with non special characters, you need to handle ignored words in the caller somehow.

char hash(char const * string, int * h) {
  *h = 0;
  for (; *string; ++string) {
    int index = tolower(*string) - 'a' + 1;
    if ((index > 0) && (index < 27)) {
      *h += index;
    } else {
      return 0;
    }
  }
  return 1;
}

That way you can use the return value to test if the word should be ignored.

Upvotes: 2

Raman
Raman

Reputation: 2785

change h=h+j to h=h+j+1 and h=1 to h=0.

Also you should free the allocated memory so include these lines just before return:

free(A);
free(a);

However I don't understand why so complicated code was written for such a simple task.

A much simpler code can be written:

int hash(char *word)
{
   int sum=0;
   while(*word != '\0')
   {

       if(*word >='A' && *word < 'A'+26)
            sum=sum+(*word -'A' + 1);
       else if(*word >='a' && *word < 'a'+26)
            sum=sum+(*word -'a' + 1);
       else
            return -1;

       word++;
   }

   return sum;
}

Upvotes: 2

Spikatrix
Spikatrix

Reputation: 20244

I think that changing

int h = 1;

to

int h = 0;

and

h = h + j;

to

h = h + j + 1;

will fix the issue.


The one other problem is that you forgot to free the malloced memory. Also, there is no need to cast the result of malloc(and family) in C.

This

for (i = 0; i < strlen(word); i++) {

will call strlen in every iteration of the loop. This will reduce the performance of your program. Use

int len = strlen(word);
for (i = 0; i < len; i++) {

instead, which is much faster as strlen isn't called in every iteration. Lastly, sizeof(char) is 1. So you can omit it.

Upvotes: 5

Related Questions