Dev Dog
Dev Dog

Reputation: 75

Simple Word Count in C

I'm creating a word count function in c as part of a larger text file processing program, but I'm encountering some discrepancy in results.

Below is the relevant code snippet:

#define OUT 0
#define IN  1
unsigned countWords(char * filename) {

    FILE * fp = fopen(filename, "r");
    int state = OUT;
    int wc = 0;
    char c;

    if(fp == NULL) {
        perror("Could not open file");
    }
    while((c = fgetc(fp)) != EOF) {
            printf("c: %c & wc: %d\n", c, wc);
        if(c == ' ' || c == '\n' || c == '\t') {
            state = OUT;
        }
        else if (state == OUT) {
            state = IN;
            ++wc;
        }
    }
    fclose(fp);
    return wc;
}

I'm testing this function with two short .txt files:

word word word  word
word

and..

word word word
word 



Note that in the second .txt, the last word is followed by 3 newline characters.

When I run these .txts through, the first file is always counted correctly with a return of 5, but the second file seems to be reading the 3 newline characters at the end of the file and increases the count to 7.

I'm sure I'm missing something obvious but I would appreciate any help.

Upvotes: 1

Views: 248

Answers (1)

th3hunter
th3hunter

Reputation: 46

Tip : if you move text file from windows to Unix system machine you can always run dos2unix command on the files and you won't need to worry about checking newline format that windows system uses.

Upvotes: 1

Related Questions