haccks
haccks

Reputation: 106012

Average word length for a sentence

I want to calculate average word length for a sentence.

For example, given input abc def ghi, the average word length would be 3.0.

The program works but i want to ignore extra spaces between the words. So, given the following sentence:

abc  def

(two spaces between the words), the average word length is calculated to be 2.0 instead of 3.0.

How can I take into account extra spaces between words? These are to be ignored, which would give average word length of 3.0 in the example above, instead of the erroneously calculated 2.0.

#include <stdio.h>
#include <conio.h>

int main()
{
char ch,temp;
float avg;
int space = 1,alphbt = 0,k = 0;

printf("Enter a sentence: ");

while((ch = getchar()) != '\n')
{
    temp = ch;

    if( ch != ' ')
    {
       alphbt++;
       k++;         // To ignore spaces before first word!!!
    }  
    else if(ch == ' ' && k != 0)
       space++;

}

if (temp == ' ')    //To ignore spaces after last word!!!
   printf("Average word lenth: %.1f",avg = (float) alphbt/(space-1));
else
   printf("Average word lenth: %.1f",avg = (float) alphbt/space);

getch();
}               

Upvotes: 0

Views: 8233

Answers (3)

Xantix
Xantix

Reputation: 3331

Consider the following input: (hyphens represent spaces)

--Hello---World--

You currently ignore the initial spaces and the ending spaces, but you count each of the middle spaces, even though they are next to each other. With a slight change to your program, in particular to 'k' we can deal with this case.

#include <stdio.h>
#include <conio.h>
#include <stdbool.h>
int main()
{
  char ch;
  float avg;
  int numWords = 0;
  int numLetters = 0;
  bool prevWasASpace = true; //spaces at beginning are ignored

  printf("Enter a sentence: ");

  while((ch = getchar()) != '\n')
  {
      if( ch != ' ')
      {
         prevWasASpace = false;
         numLetters++;
      }  
      else if(ch == ' ' && !prevWasASpace)
      {
         numWords++;
         prevWasASpace = true; //EDITED this line until after the if.
      }
  } 

  avg = numLetters / (float)(numWords);

  printf("Average word lenth: %.1f",avg);

  getch();
}          

You may need to modify the preceding slightly (haven't tested it).

However, counting words in a sentence based on only spaces between words, might not be everything you want. Consider the following sentences:

John said, "Get the phone...Now!"

The TV announcer just offered a buy-1-get-1-free deal while saying they are open 24/7.

It wouldn't cost them more than $100.99/month (3,25 euro).

I'm calling (555) 555-5555 immediately on his/her phone.

A(n) = A(n-1) + A(n-2) -- in other words the sequence: 0,1,1,2,3,5, . . .

You will need to decide what constitutes a word, and that is not an easy question (btw, y'all, none of the examples included all varieties of English). Counting spaces is a pretty good estimate in English, but it won't get you all of the way.

Take a look at the Wikipedia page on Text Segmentation. The article uses the phrase "non-trivial" four times.

Upvotes: 2

Macattack
Macattack

Reputation: 1990

Obviously counting non-space characters is easy, your problem is counting words. Why count words as spaces as you're doing? Or more importantly, what defines a word?

IMO a word is defined as the transition from space character to non-space character. So, if you can detect that, you can know how many words you have and your problem is solved.

I have an implementation, there are many possible ways to implement it, I don't think you'll have trouble coming up with one. I may post my implementation later as an edit.

*Edit: my implementation

#include <stdio.h>

int main()
{
    char ch;
    float avg;
    int words = 0;
    int letters = 0;
    int in_word = 0;

    printf("Enter a sentence: ");

    while((ch = getchar()) != '\n')
    {
        if(ch != ' ') {
            if (!in_word) {
                words++;
                in_word = 1;
            }
            letters++;
        }
        else {
            in_word = 0;
        }
    }

    printf("Average word lenth: %.1f",avg = (float) letters/words);
}

Upvotes: 2

Jonathan Leffler
Jonathan Leffler

Reputation: 753575

The counting logic is awry. This code seems to work correctly with both leading and trailing blanks, and multiple blanks between words, etc. Note the use of int ch; so that the code can check for EOF accurately (getchar() returns an int).

#include <stdio.h>
#include <stdbool.h>

int main(void)
{
    int ch;
    int numWords = 0;
    int numLetters = 0;
    bool prevWasASpace = true; //spaces at beginning are ignored

    printf("Enter a sentence: ");
    while ((ch = getchar()) != EOF && ch != '\n')
    {
        if (ch == ' ')
            prevWasASpace = true;
        else
        {
            if (prevWasASpace)
                numWords++;
            prevWasASpace = false;
            numLetters++;
        }
    }

    if (numWords > 0)
    {
        double avg = numLetters / (float)(numWords);
        printf("Average word length: %.1f (C = %d, N = %d)\n", avg, numLetters, numWords);
    }
    else
        printf("You didn't enter any words\n");
    return 0;
}

Various example runs, using # to indicate where Return was hit.

Enter a sentence: A human in Algiers#
Average word length: 3.8 (C = 15, N = 4)

Enter a sentence: A human in Algiers  #
Average word length: 3.8 (C = 15, N = 4)

Enter a sentence:   A human  in   Algiers  #
Average word length: 3.8 (C = 15, N = 4)

Enter a sentence: #
You didn't enter any words

Enter a sentence: A human in AlgiersAverage word length: 3.8 (C = 15, N = 4)

Enter a sentence: You didn't enter any words

In the last but one example, I typed Control-D twice (the first to flush the 'A human in Algiers' to the program, the second to give EOF), and once in the last example. Note that this code counts tabs as 'not space'; you'd need #include <ctype.h> and if (isspace(ch)) (or if (isblank(ch))) in place of if (ch == ' ') to handle tabs better.


getchar() returns an int

I am confused why you have used int ch and EOF!

There are several parts to this answer.

  1. The first reason for using int ch is that the getchar() function returns an int. It can return any valid character plus a separate value EOF; therefore, its return value cannot be a char of any sort because it has to return more values than can fit in a char. It actually returns an int.

  2. Why does it matter? Suppose the value from getchar() is assigned to char ch. Now, for most characters, most of the time, it works OK. However, one of two things will happen. If plain char is a signed type, a valid character (often ÿ, y-umlaut, 0xFF, formally Unicode U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS) is misrecognized as EOF. Alternatively, if plain char is an unsigned type, then you will never detect EOF.

  3. Why does detecting EOF matter? Because your input code can get EOF when you aren't expecting it to. If your loop is:

    int ch;
    
    while ((ch = getchar()) != '\n')
        ...
    

    and the input reaches EOF, the program is going to spend a long time doing nothing useful. The getchar() function will repeatedly return EOF, and EOF is not '\n', so the loop will try again. Always check for error conditions in input functions, whether the function is getchar(), scanf(), fread(), read() or any of their myriad relatives.

Upvotes: 4

Related Questions