user3516302
user3516302

Reputation: 81

C Word Count program

I am trying to write a program that will count the number of characters, words and lines in a text, the text is:

It was a dark and stormy night;
the rain fell in torrents - except
at occasional intervals, when it was
checked by a violent gust of wind
which swept up the streets (for it is
in London that our scene lies),
rattling along the housetops, and fiercely
agitating the scanty flame of the lamps
that struggled against the darkness.

  Edward Bulwer-Lytton's novel Paul Clifford.

I keep getting 62 instead of 64, any suggestions?

#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>

int main() {
    int tot_chars = 0;     /* total characters */
    int tot_lines = 0;     /* total lines */
    int tot_words = 0;     /* total words */
    int boolean;
    /* EOF == end of file */
    int n;
    while ((n = getchar()) != EOF) {
        tot_chars++;
        if (isspace(n) && !isspace(getchar())) {
            tot_words++;
        }
        if (n == '\n') {
            tot_lines++;
        }
        if (n == '-') {
            tot_words--;
        }
    }
    printf("Lines, Words, Characters\n");
    printf(" %3d %3d %3d\n", tot_lines, tot_words, tot_chars);

    // Should be 11 64 375
    // rn     is 11 65 375
    return 0;
}

Upvotes: 2

Views: 35802

Answers (6)

debug
debug

Reputation: 1079

$ ./a.out " a b " "a b c " "a b c d"
s =  a b , words_cnt= 2
 s = a b c , words_cnt= 3
 s = a b c d, words_cnt= 4

$ ./a.out "It was a dark and stormy night;
> the rain fell in torrents - except
......
  Edward Bulwer-Lytton's novel Paul Clifford., words_cnt = 64

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>


int
count_words(const char *s)
{
    int i, w;

    for (i = 0, w = 0; i < strlen(s); i++)
    {
        if (!isspace(*(s+i)))
        {
            w++;
            while (!isspace(*(s+i)) && *(s+i) != '\0')
            {
                i++;
            }
        }
    }

    return w;
}

int
main(int argc, const char *argv[])
{
    int i;

    if (argc < 2)
    {
        printf("[*] Usage: %s <str1> <str2> ...\n", argv[0]);
        return -1;
    }

    for (i = 1; i < argc; i++)
    {
        printf("s = %s, words_cnt= %d\n ", argv[i], count_words(argv[i]));
    }

    return 0;
}

Upvotes: 0

chqrlie
chqrlie

Reputation: 145297

There are multiple problems in your code:

  • in the test if (isspace(n) && !isspace(getchar())) you potentially consume a byte from the file and fail to increment tot_chars, furthermore you do not increment tot_words if 2 words are separated by 2 white space characters. This causes darkness. and Edward to be counted as a single word.
  • you decrement tot_words when you see a hyphen, which is incorrect as words are separated by white space only. This causes Bulwer-Lytton's to be counted as 1-1, ie zero. Hence you only get 62 words instead of 64.

  • on a lesser note, the name n is confusing for a byte read from the file. It is usually a more appropriate name for a count. The idiomatic name for a byte read from a file is c, and the type is correct as int to accommodate for all values of unsigned char plus the special value EOF.

To detect word boundaries, you should use a state and update the word count when the state changes:

#include <ctype.h>
#include <stdio.h>

int main(void) {
    int tot_chars = 0;     /* total characters */
    int tot_lines = 0;     /* total lines */
    int tot_words = 0;     /* total words */
    int in_space = 1;
    int c, last = '\n';

    while ((c = getchar()) != EOF) {
        last = c;
        tot_chars++;
        if (isspace(c)) {
            in_space = 1;
            if (c == '\n') {
                tot_lines++;
            }
        } else {
            tot_words += in_space;
            in_space = 0;
        }
    }
    if (last != '\n') {
        /* count last line if not linefeed terminated */
        tot_lines++;
    }

    printf("Lines, Words, Characters\n");
    printf(" %3d %3d %3d\n", tot_lines, tot_words, tot_chars);

    return 0;
}

Upvotes: 2

Vino
Vino

Reputation: 922

I check your code and it works fine, also i got the output (total words) as it desired to be- Seems the code has been edited from its original post

Attaching the Output what I got after running the code- Output enter image description here

Upvotes: 0

Mysterious Jack
Mysterious Jack

Reputation: 641

Actually Now i think you have to modify the program,Assuming words are separated by spaces(any other white space Character) and counting on this base will not work if your text has two or more spaces(any other white space Character) to separate a single word. Because this will be also counted as words, (when there where no actual words used)

I think your last if block is really messy, you are using ispunct() to decrement tot_words but your words in text uses punctuation marks in them(without spaces),This means they are part of words. so you should not decrement them.

Previously i thought we should check only for the '-' character in last if block, As its used in 1st para of text with spaces, but it is also again used in Novel name without any space, so i think you should completely ignore last ifblock and consider '-' as word for simplicity of the logic.

I have modified the first if block it makes your program error proof even when two or more spaces(any other white space Character) are given to separate a word.

if (isspace(n))  // isspace() checks for whitespace characters '  ', '\t', '\n','\r, so no need to write like this (isspace(n) || n == '\n')
    boolean=0; //outside of word.     
else if(boolean==0){
    tot_words++;
    boolean=1; //inside of word.
 }

 if (n=='\n')
         tot_lines++;

Upvotes: 1

user207064
user207064

Reputation: 665

Change

        if (n=='\n'){
                tot_lines++;
                tot_words++;
        }

to

  if (n=='\n'){
                tot_lines++;
        }

You are already counting word at new line in

            if (isspace(n) || n == '\n'){
                    tot_words++;
            }

So effectively you are incrementing word counter one time extra then required for each line.

Upvotes: 0

Michelle
Michelle

Reputation: 2900

Both of the following conditionals increment your word count on newline characters, which means that every word followed by a newline instead of a space is counted twice:

if (isspace(n) || n == '\n'){
     tot_words++;
}
if (n=='\n'){
     tot_lines++;
     tot_words++;
}

If you get rid of the || n == '\n' bit, you should get the correct count.

Upvotes: 0

Related Questions