Printing Word Length Histogram in C

Okay so I'm using Kernighan and Ritchie's "The C Programming Language" and I'm on exercise 1.13 and I can't seem to get this right. My program seems to not be printing much. The problem is as follows:

Exercise 1-13. Write a program to print a histogram of the lengths of words in its input. It is easy to draw the histogram with the bars horizontal; a vertical orientation is more challenging.

Besides the creation of variables, here's my pseudocode for reading the input and storing what I want to store in the array.

  1. Create an array -- in this case, my array is of size 21 (21 elements, from 0 to 20) all assigned a value of 0 initially. It has 21 elements because I'm not going to use words that have more than 20 characters. I realize this is weird given no words have zero characters.
  2. Begin counting characters in input.
  3. If I encounter a space, tab, or newline character (i.e., this means the first word ended), stop.
  4. Depending on how many characters the word had, increment that particular position in the array (i.e., if the word had two characters add 1 to the element at position 2 in the array).
  5. Increment the wordCounter variable -- this variable as it's name indicates keeps track of the number of words that have been "read" in the input.
  6. Continue doing this to each word until EOF is reached.

Here's my pseudocode for printing the histogram (horizontally).

  1. For the first position, print the value stored in the first position of the array (i.e., 0) using tick marks |
  2. Do this for every element in the array.

Here's my code:

#include <stdio.h>

#define SIZEOFWORDSOFLENGTH 21

int main() {
    int wordsOfLength[SIZEOFWORDSOFLENGTH];
    int c, i, j;
    int lengthCounter = 0;

    /*Initializing all array elements to 0.*/
    for (i = 0; i < SIZEOFWORDSOFLENGTH; i++)
        wordsOfLength[i] = 0;

    /*Going through the input and counting.*/
    while ((c = getchar()) != EOF) {
        ++lengthCounter;
        if (c == ' ' || c == '\t' || c == '\n') {
            ++wordsOfLength[lengthCounter - 1];
            lengthCounter = 0;
        }
    }

    for (i = 0; i < SIZZEOFWORDSOFLENGTH; i++) {
        printf("Words of Length %d: ", i);
        /*The third argument of the following for loop was previously j = j*/
        for (j = 0; j < SIZEOFWORDSOFLENGTH; j++) {
            while (j < wordsOfLength[i]) {
                printf("|");
                /*Was previously j++ instead of break*/
                break;
            }
        }
        printf("\n");
    }
}

I debugged it by hand but I can't seem to find the problem. Maybe something really simple is going over my head. Also, I know this question has been asked before but I'm not trying to find a solution for the actual problem, I think my pseudocode is right if not somewhat right, I just want to know what's wrong with my code and maybe learn something. Thank you in advance.

Upvotes: 1

Views: 1547

Answers (2)

Jonathan Leffler
Jonathan Leffler

Reputation: 753525

As indicated in Ji-Young Park's answer, the reading loop has problems because it uses negative indexes into the array wordsOfLength. I would keep life simple and have wordsOfLength[i] store the number of words of length i, though it effectively wastes wordsOfLength[0]. I would use the macros from <ctype.h> to spot word boundaries, and I'd keep a record of whether I was in a word or not. You get credit for using int c.

int inword = 0;
while ((c = getchar()) != EOF)
{
    if (!isspace(c))
        lengthCounter++;
    else if (inword)
    {
        wordsOfLength[lengthCounter]++;
        lengthCounter = 0;
        inword = 0;
    }
}
if (inword)
    wordsOfLength[lengthCounter]++;

This code is not bamboozled by leading white space in the file. If you think there's any risk of reading 'antidisestablishmentarianism' (28) or 'floccinaucinihilipilification' (29) or other grotesquely long words, you should check on lengthCounter before blindly using it as an index, either dropping overlong words from the count or mapping them all to '20+ characters' class.

Your final triple loop is quite problematic too — it is currently:

for (i = 0; i < SIZZEOFWORDSOFLENGTH; i++) {
    printf("Words of Length %d: ", i);
    /*The third argument of the following for loop was previously j = j*/
    for (j = 0; j < SIZEOFWORDSOFLENGTH; j++) {
        while (j < wordsOfLength[i]) {
            printf("|");
            /*Was previously j++ instead of break*/
            break;
        }
    }
    printf("\n");
}

Under my scheme, I'd start with i = 1, but that isn't a major issue. I'd ensure that the first printf() printed for 2 digits to align the output for the counts of words of lengths 10-20.

The inner for loop should be constrained by wordsOfLength[i] rather than SIZEOFWORDSOFLENGTH, and the while loop is redundant (not least because you break it on the first iteration each time). You just need simple nested for loops:

for (i = 1; i < SIZZEOFWORDSOFLENGTH; i++)
{
    printf("Words of Length %2d: ", i);
    for (j = 0; j < wordsOfLength[i]; j++)
        printf("|");
    printf("\n");
}

The only issue now is if the maximum value in wordsOfLength is too long for comfort (for example, you've read the entire works of Shakespeare, so some of the words appear thousands of times).

Upvotes: 2

Ji-Young Park
Ji-Young Park

Reputation: 41

you don't need to substract '1' from lengthCounter in

++wordsOfLength[lengthCounter - '1'];

It should be

++wordsOfLength[lengthCounter];

Upvotes: 0

Related Questions