Lukali
Lukali

Reputation: 343

Why is my character count incorrect?

The following code gets the number of words:

int count = 0;
for (int i = 0; chars[i] != EOF; i++)
{
    if (chars[i] == ' ')
    {
         count++;
    }
}

My problem is, that it doesn't count the words correctly.

For example, if my file.txt has the following text in it:

spaced-out there's I'd like

It says I have 6 words, when according to MS Word I'd have 4.

spaced-out and in

Gives me a word count of 4.

spaced out and in

Gives me a word count of 6

I'm sorry if this question has been answered before, Google doesn't take into account the special characters in the search, so it is hard to find the answer to coding. I'd preferably have the words just by identifying if it's a space or not.

I tried looking for answers but no one seemed to have the same problem exactly. I know that the .txt files might end in /r/n in Windows, but then that should be part of one word. For example:

spaced out and in/r/n

I believe it should still give me 4 words. Also when I add || chars[i] == '\n' as:

for (int i = 0; chars[i] != EOF || chars[i] == '\n'; i++)

I get even more words, 8 for the line

spaced out and in

I am doing this on a Linux-based server, but on an SSH client on Windows. The characters come from a .txt file.


Edit: Okay, here is the code, I avoided the #include when posting it.

#define BUF_SIZE 500            
#define OUTPUT_MODE 0700        

int main(int argc, char *argv[])
{
    int input, output;
    int readSize = 1, writeSize;            
    char chars[BUF_SIZE];   
    int count = 0;

    input = open(argv[1], O_RDONLY);                

    output = creat(argv[2], OUTPUT_MODE);   

    while (readSize > 0)                
    {
        readSize = read(input, chars, BUF_SIZE); 
        if (readSize < 0)       
        exit(4);

        for (int i = 0; chars[i] != '\0'; i++)
        {
            if (chars[i] == ' ')
            {
                count++;
            }
        }

        writeSize = write(output, chars, readSize);     
        if (writeSize <= 0)             
        {
            close(input);       
            close(output);
            printf("%d words\n", count);
            exit(5);
        }
    }
}

Upvotes: 4

Views: 517

Answers (1)

Iharob Al Asimi
Iharob Al Asimi

Reputation: 53016

I am writing this answer because I think, I know what your confusion is. But note that you did not explain how you read the file, I'll give an example and explain why we test != EOF, which is not a character that you read from a file.

It appears that you think EOF is a character that is stored in the file, well it's not. If you just want to count words you can do something like

int chr;
while ((chr = fgetc(file)) != EOF)
    count += (chr == ' ') ? 1 : 0;

note that chr MUST be of type int because EOF is of type int, but it's certainly not present in the file! It's returned by functions like fgetc() to indicate that there is nothing more to read, note that an attempt to read must be made in order for it to return it.

Oops, also note that my sample code will not count the last word. But that's for you to figure out.

Also, this would count multiple spaces as "words" something that you should also workout.

Upvotes: 4

Related Questions