Reputation: 75
I'm creating a word count function in c as part of a larger text file processing program, but I'm encountering some discrepancy in results.
Below is the relevant code snippet:
#define OUT 0
#define IN 1
unsigned countWords(char * filename) {
FILE * fp = fopen(filename, "r");
int state = OUT;
int wc = 0;
char c;
if(fp == NULL) {
perror("Could not open file");
}
while((c = fgetc(fp)) != EOF) {
printf("c: %c & wc: %d\n", c, wc);
if(c == ' ' || c == '\n' || c == '\t') {
state = OUT;
}
else if (state == OUT) {
state = IN;
++wc;
}
}
fclose(fp);
return wc;
}
I'm testing this function with two short .txt files:
word word word word word
and..
word word word word
Note that in the second .txt, the last word is followed by 3 newline characters.
When I run these .txts through, the first file is always counted correctly with a return of 5, but the second file seems to be reading the 3 newline characters at the end of the file and increases the count to 7.
I'm sure I'm missing something obvious but I would appreciate any help.
Upvotes: 1
Views: 248
Reputation: 46
Tip : if you move text file from windows to Unix system machine you can always run dos2unix command on the files and you won't need to worry about checking newline format that windows system uses.
Upvotes: 1