Reputation: 2029
Here is a small code snippet.
while((c = fgetc(fp)) != -1)
{
cCount++; // character count
if(c == '\n') lCount++; // line count
else
{
if(c == ' ' && prevC != ' ') wCount++; // word count
}
prevC = c; // previous character equals current character. Think of it as memory.
}
Now when I run wc
with the file containing this above snippet code(as is), I am getting 48 words, but when I use my program on same input data, I am getting 59 words.
How to calculate word count exactly like wc does?
Upvotes: 0
Views: 3186
Reputation: 12679
You can do:
int count()
{
unsigned int cCount = 0, wCount = 0, lCount = 0;
int incr_word_count = 0;
char c;
FILE *fp = fopen ("text", "r");
if (fp == NULL)
{
printf ("Failed to open file\n");
return -1;
}
while((c = fgetc(fp)) != EOF)
{
cCount++; // character count
if(c == '\n') lCount++; // line count
if (c == ' ' || c == '\n' || c == '\t')
incr_word_count = 0;
else if (incr_word_count == 0) {
incr_word_count = 1;
wCount++; // word count
}
}
fclose (fp);
printf ("line : %u\n", lCount);
printf ("word : %u\n", wCount);
printf ("char : %u\n", cCount);
return 0;
}
Upvotes: 0
Reputation: 56
There is an example of the function you want in the book: "Brian W Kernighan And Dennis M Ritchie: The Ansi C Programming Language". As the author says: This is a bare-bones version of the UNIX program wc. Altered to count only words is like this:
#include <stdio.h>
#define IN 1 /* inside a word */
#define OUT 0 /* outside a word */
/* nw counts words in input */
main()
{
int c, nw, state;
state = OUT;
nw = 0;
while ((c = getchar()) != EOF) {
if (c == ' ' || c == '\n' || c == '\t')
state = OUT;
else if (state == OUT) {
state = IN;
++nw;
}
}
printf("%d\n", nw);
}
Upvotes: 1
Reputation: 7324
You are treating anything that isn't a space as a valid word. This means that a newline followed by a space is a word, and since your input (which is your code snippet) is indented you get a bunch of extra words.
You should use isspace
to check for whitespace instead of comparing the character to ' '
:
while((c = fgetc(fp)) != EOF)
{
cCount++;
if (c == '\n')
lCount++;
if (isspace(c) && !isspace(prevC))
wCount++;
prevC = c;
}
Upvotes: 1
Reputation: 136
Instead of checking for spaces only you should check for escape sequences like \t \n space and so on.
This will give the correct results.
You can use isspace() from <ctype.h>
Change the line
if(c == ' ' && prevC != ' ') wCount++;
to
if(isspace(c) && !(isspace(prevC)) wCount++;
This would give the correct results.
Don't forget to include <ctype.h>
Upvotes: 0