user2832774
user2832774

Reputation: 3

C - How to count words in a txt file?

So I'm supposed to count how many words there are in a txt file with multiple lines, and words are defined as a continuous sequence of letters (a through z, and A through Z) and the apostrophe seperated by any character outside these ranges.

I've got what I think looks right, but the wordcount keeps on coming out wrong. Does anyone see anything weird about my code?

Please ignore the linecount and charcount, as they are working properly. I tried counting the spaces between the words, with 32 being the ASCII code for a space.

#include <stdio.h>

int main()
{
int c;
int charcount = 0;
int wordcount = 1;
int linecount = 0;

while (c != EOF)
{
    c = getchar();
    if (c == EOF)
        break;
    if (c == 10)
        linecount++;

    charcount++;

    if (c == 32)
        wordcount++;

}

printf ("%d %d %d\n", charcount, wordcount, linecount);
return 0;

}

So for example, one of the txt files says:

Said Hamlet to Ophelia,
I'll draw a sketch of thee,
What kind of pencil shall I use?
2B or not 2B?

The word count here is 21, but I get a wordcount of 18. I tried counting in the number of "/n" and it works for this test, but it fails for the next test.

Thanks in advance!

Upvotes: 0

Views: 3491

Answers (2)

cybermage14
cybermage14

Reputation: 168

Include ctype.h and then change

if (c == 32)
    wordcount++

to

if (isspace(c))
    wordcount++

Words are separated by spaces, tabs, and line characters.

Upvotes: 1

mcleod_ideafix
mcleod_ideafix

Reputation: 11438

Use a simple FSM coded in C:

#include <stdio.h>
#include <ctype.h>

enum {INITIAL,WORD,SPACE};

int main()
{
  int c;
  int state = INITIAL;
  int wcount = 0;

  c = getchar();
  while (c != EOF)
  {
    switch (state)
    {
      case INITIAL: wcount = 0;
                    if (isalpha(c) || c=='\'')
                    {
                       wcount++;
                       state = WORD;
                    }
                    else
                       state = SPACE;
                    break;

      case WORD:    if (!isalpha(c) && c!='\'')
                       state = SPACE;
                    break;

      case SPACE:   if (isalpha(c) || c=='\'')
                    {
                       wcount++;
                       state = WORD;
                    }
    }
    c = getchar();
  }
  printf ("%d words\n", wcount);
  return 0;
}

Upvotes: -1

Related Questions