maimun4itu
maimun4itu

Reputation: 45

C find all anagrams in the dictionary

I am trying to write a program which finds all anagrams (words that are remade by rearranging their letters) in the dictionary (/usr/share/dict/words file in Linux). The dictionary file contains a lot of words that ends with " 's " which I want to exclude from checking. Here is what I wrote to do that, but unfortunately the result file contains lines with just one letter "s" I have no idea where it comes from.

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>

#define TRUE    1
#define FALSE   0

int wordContainsNonAlpha(char *word);

int main(int argc, const char *argv[]){
   FILE *fp_read = fopen("/usr/share/dict/words", "r");
   char *word = malloc(sizeof(word));
   FILE *fp_write = fopen("words.txt","w");

   while (fgets(word,sizeof(word), fp_read) != NULL){
      if (wordContainsNonAlpha(word)){
         fprintf(fp_write,"%s","\n");
      }
      else{
//         fputs(word, stdout);
         fprintf(fp_write,"%s",word);
        }
   }

   fclose(fp_read);
   fclose(fp_write);

   return 0;
}

int wordContainsNonAlpha(char *word){
   int currentLetter = 0;
   int wordLenght = strlen(word);
   int result = FALSE;
   char ch;

   while ( (currentLetter < wordLenght) && (result == FALSE) ){
      ch = word[currentLetter];
      if (ch == '\''){
//      if (!isalpha(ch)){
         result = TRUE;
         break;
      }
      currentLetter++;
   }

   return result;
}

The result is:

    $ sdiff words.txt /usr/share/dict/words | more
    A                                                               A
                                                                  | A's
                                                                  | AA's
                                                                  | AB's
                                                                  | ABM's
                                                                  | AC's
                                                                  | ACTH's
                                                                  | AI's
                                                                  | AIDS's
                                                                  | AM's
    AOL                                                             AOL
                                                                  | AOL's
                                                                  | ASCII's
                                                                  | ASL's
                                                                  | ATM's
                                                                  | ATP's
                                                                  | AWOL's
                                                                  | AZ's
                                                                  | AZT's
                                                                  <
    Aachen                                                          Aachen
    Aaliyah                                                         Aaliyah
    Aaliyah                                                       | Aaliyah's
    Aaron                                                           Aaron
    Abbas                                                           Abbas
    Abbasid                                                         Abbasid
    Abbott                                                          Abbott
                                                                  | Abbott's
    s                                                             <
    Abby                                                            Abby
                                                                  | Abby's
    Abdul                                                           Abdul
                                                                  | Abdul's
                                                                  <
    Abe                                                             Abe
                                                                  | Abe's
    Abel                                                            Abel
........

If I try to use function isalpha the result is even worse, as it seems it is looking for a words with specific lenght and does not work correct at all:

sdiff words.txt /usr/share/dict/words | more
                                                              | A
                                                              | A's
                                                              | AA's
                                                              | AB's
                                                              | ABM's
                                                              | AC's
                                                              | ACTH's
                                                              | AI's
                                                              | AIDS's
                                                              | AM's
                                                              | AOL
                                                              | AOL's
                                                              | ASCII's
                                                              | ASL's
                                                              | ATM's
                                                              | ATP's
                                                              | AWOL's
                                                              | AZ's
                                                              | AZT's
                                                              | Aachen
                                                              <
Aaliyah                                                         Aaliyah
Aaliyah                                                       | Aaliyah's
                                                              | Aaron
                                                              | Abbas
Abbasid                                                         Abbasid
                                                              | Abbott
                                                              | Abbott's
                                                              | Abby
                                                              | Abby's
                                                              | Abdul
                                                              | Abdul's
                                                              | Abe
                                                              | Abe's
                                                              | Abel
                                                              | Abel's
                                                              <
                                                              <
Abelard                                                         Abelard
Abelson                                                         Abelson
Abelson                                                       | Abelson's
Aberdee                                                       | Aberdeen
Aberdee                                                       | Aberdeen's
Abernat                                                       | Abernathy
Abernat                                                       | Abernathy's
Abidjan                                                       <
Abidjan                                                         Abidjan

Could you please help!

Upvotes: 0

Views: 312

Answers (1)

Zermingore
Zermingore

Reputation: 753

Your issue comes from your calls to malloc():

char *word = malloc(sizeof(word));
// Here, you allocate sizeof(char*) bytes which is only the size of a pointer and not the size of a dictionary word


while (fgets(word,sizeof(word), fp_read) != NULL){
// In this case, fgets does not stop when you expect it

To solve this issue, you can simply use, for your allocations, a constant which is the length of the longest word in the dictionary or an arbitrary value (I tested quickly with 64)

Concerning your issue with isalpha(), it's because fgets() stores the '\n' in your word.

From man fgets:

If a newline is read, it is stored into the buffer

So, you can use:

    if (ch != '\n' && !isalpha(ch)){

Upvotes: 2

Related Questions