Reputation: 45
I am trying to write a program which finds all anagrams (words that are remade by rearranging their letters) in the dictionary (/usr/share/dict/words file in Linux). The dictionary file contains a lot of words that ends with " 's " which I want to exclude from checking. Here is what I wrote to do that, but unfortunately the result file contains lines with just one letter "s" I have no idea where it comes from.
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#define TRUE 1
#define FALSE 0
int wordContainsNonAlpha(char *word);
int main(int argc, const char *argv[]){
FILE *fp_read = fopen("/usr/share/dict/words", "r");
char *word = malloc(sizeof(word));
FILE *fp_write = fopen("words.txt","w");
while (fgets(word,sizeof(word), fp_read) != NULL){
if (wordContainsNonAlpha(word)){
fprintf(fp_write,"%s","\n");
}
else{
// fputs(word, stdout);
fprintf(fp_write,"%s",word);
}
}
fclose(fp_read);
fclose(fp_write);
return 0;
}
int wordContainsNonAlpha(char *word){
int currentLetter = 0;
int wordLenght = strlen(word);
int result = FALSE;
char ch;
while ( (currentLetter < wordLenght) && (result == FALSE) ){
ch = word[currentLetter];
if (ch == '\''){
// if (!isalpha(ch)){
result = TRUE;
break;
}
currentLetter++;
}
return result;
}
The result is:
$ sdiff words.txt /usr/share/dict/words | more
A A
| A's
| AA's
| AB's
| ABM's
| AC's
| ACTH's
| AI's
| AIDS's
| AM's
AOL AOL
| AOL's
| ASCII's
| ASL's
| ATM's
| ATP's
| AWOL's
| AZ's
| AZT's
<
Aachen Aachen
Aaliyah Aaliyah
Aaliyah | Aaliyah's
Aaron Aaron
Abbas Abbas
Abbasid Abbasid
Abbott Abbott
| Abbott's
s <
Abby Abby
| Abby's
Abdul Abdul
| Abdul's
<
Abe Abe
| Abe's
Abel Abel
........
If I try to use function isalpha the result is even worse, as it seems it is looking for a words with specific lenght and does not work correct at all:
sdiff words.txt /usr/share/dict/words | more
| A
| A's
| AA's
| AB's
| ABM's
| AC's
| ACTH's
| AI's
| AIDS's
| AM's
| AOL
| AOL's
| ASCII's
| ASL's
| ATM's
| ATP's
| AWOL's
| AZ's
| AZT's
| Aachen
<
Aaliyah Aaliyah
Aaliyah | Aaliyah's
| Aaron
| Abbas
Abbasid Abbasid
| Abbott
| Abbott's
| Abby
| Abby's
| Abdul
| Abdul's
| Abe
| Abe's
| Abel
| Abel's
<
<
Abelard Abelard
Abelson Abelson
Abelson | Abelson's
Aberdee | Aberdeen
Aberdee | Aberdeen's
Abernat | Abernathy
Abernat | Abernathy's
Abidjan <
Abidjan Abidjan
Could you please help!
Upvotes: 0
Views: 312
Reputation: 753
Your issue comes from your calls to malloc():
char *word = malloc(sizeof(word));
// Here, you allocate sizeof(char*) bytes which is only the size of a pointer and not the size of a dictionary word
while (fgets(word,sizeof(word), fp_read) != NULL){
// In this case, fgets does not stop when you expect it
To solve this issue, you can simply use, for your allocations, a constant which is the length of the longest word in the dictionary or an arbitrary value (I tested quickly with 64)
Concerning your issue with isalpha(), it's because fgets() stores the '\n' in your word.
From man fgets:
If a newline is read, it is stored into the buffer
So, you can use:
if (ch != '\n' && !isalpha(ch)){
Upvotes: 2