LifeLong21
LifeLong21

Reputation: 15

Why can't my C function handle punctuation?

I'm trying to do a simple scrabble game in the cs50 pset Week 2 and the function, int compute_score(string word), can't handle an input that uses punctuation, even though it's roughly the same as the correct answer using less lines of code by converting all input to uppercase. Here's the code below, but all you really need to look at is the function that I named above:

#include <ctype.h>
#include <cs50.h>
#include <stdio.h>
#include <string.h>

// Points assigned to each letter of the alphabet
int POINTS[] = { 1, 3, 3, 2, 1, 4, 2, 4, 1, 8, 5, 1, 3, 1, 1, 3, 10, 1, 1, 1, 1, 4, 4, 8, 4, 10 };

int compute_score(string word);

int main()
{
    // Get input words from both players
    string word1 = get_string("Player 1: ");
    string word2 = get_string("Player 2: ");

    // Score both words
    int score1 = compute_score(word1);
    int score2 = compute_score(word2);

    // TODO: Print the winner
    if (score1 > score2)
    {
        printf("Player 1 wins!\n");
    }
    else if (score1 < score2)
    {
        printf("Player 2 wins!\n");
    }
    else
    {
        printf("Tie!\n");
    }
}

int compute_score(string word)
{
    // TODO: Compute and return score for string
    //Initialize score
    int score = 0;

    //Convert array of chars to uppercase and solve
    for (int i = 0, N = strlen(word); i < N; i++)
    {
        score = score + POINTS[toupper(word[i]) - 65];
    }
    return score;
}

Before I got to this point, I was having trouble using toupper on individual chars until I watched a video explaining the logic of using an ASCII chart and how to iterate chars in a string from the lecture. So inside the for loop, I wrote:

//Convert array of chars to uppercase and solve
    for (int i = 0, N = strlen(word); i < N; i++)
    {
        score = score + POINTS[toupper(word[i]) - 65];
    }
    return score;

I made the decision to convert the input to all uppercase because since characters like, A and g have the same value as their capitalized/non-capitalized counterpart, I thought it would just be simpler to convert it to uppercase so that the logic is simpler, faster, and more efficiently written. Made more sense in my head, too. However, when I use the check50 thing, everything gets put in the green EXCEPT for anything that has punctuation (with one exception at the end). Here's what the terminal test shows:

terminal results

Now I just don't understand this at all, because in my eyes, it's almost totally the same as the correct answer, which is this:

for (int i = 0, N = strlen(word); i < N; i++)
{
    if (isupper(word[i]))
    {
        score += POINTS[word[i] - 'A'];
    }
    else if (islower(word[i]))
    {
        score += POINTS[word[i] - 'a'];
    }
}

I have no idea why it's not working. I'm thinking that for some reason, it's grading the punctuation. That doesn't make sense though, because since toupper is designed to only work with alphabetical characters, it should be excluding special characters, rendering their value to zero. Does anyone have any suggestions for what's going wrong?

Upvotes: 2

Views: 195

Answers (4)

chux
chux

Reputation: 154243

Why can't my C function handle punctuation?

Because code uses POINTS[toupper(word[i]) - 65]; and that attempts array access outside POINTS[] when word[i] is a punctuation character. toupper(word[i]) for non-letters return word[i] leading to an array index outside [0...25]. This eventually causes undefined behavior (UB).


An alternative is to specify POINTS[] for all characters.

// Non-specified initialized result in zeroes for remaining elements.
int points[UCHAR_MAX + 1] = { ['A'] = 1, ['a'] = 1, ['B'] = 3, ['b'] = 3,
    // 23 more pairs
    ['Z'] = 10, ['z'] = 10};

...
const unsigned char *ch = (const unsigned char *) word;
while (*ch) {
  score += points[*ch];
  ch++;
}

No need for an int array to store small values. A narrower type (preferably of the same sign-ness as score) will suffice.

// int points[UCHAR_MAX + 1] = { ['A'] = 1, ['a'] = 1, ['B'] = 3, ...
signed char points[UCHAR_MAX + 1] = { ['A'] = 1, ['a'] = 1, ['B'] = 3, ...

Upvotes: 0

Eric Postpischil
Eric Postpischil

Reputation: 224312

… toupper is designed to only work with alphabetical characters, it should be excluding special characters, rendering their value to zero.

toupper does not change non-alphabetic characters. toupper('!') evaluates to the same value as '!', not to zero. toupper is specified in C 2018 76.4.2.2, in which paragraph 3 says (bold added):

If the argument is a character for which islower is true and there are one or more corresponding characters, as specified by the current locale, for which isupper is true, the toupper function returns one of the corresponding characters (always the same one for any given locale); otherwise, the argument is returned unchanged.

Even if toupper did produce zero for non-alphabetic characters, your expression POINTS[toupper(word[i]) - 65] would then evaluate to POINTS[0 - 65], which is not what you want. (Also, never write 65 for 'A' except in special circumstances such as translating between character sets. In ordinary source code, use 'A' for the code for “A”.)

You must write code to test for non-alphabetic characters and not add to the score for them.

(Also note that using x - 'A' is not a fully portable way to convert a letter to a value from 0 to 25, as the C standard does not require the character codes for letters to be consecutive. It is okay for simple student projects and situations where you know ASCII is used, but fully portable solutions need additional work.)

Upvotes: 3

Tim Randall
Tim Randall

Reputation: 4155

When you pass a punctuation character to toupper(), it is going to return that value unchanged.

Your code then uses it as an array index into the POINTS array, which only has 26 entries... leading to undefined behavior.

You'd want to modify the code so that it skips over any character that isn't either an uppercase or lowercase letter. That's the key difference between your attempt and the 'correct' code.

In the context of a Scrabble game, though, the best thing to do might be to "bail out" of compute_score() with a return value of zero, since punctuation characters are not valid.

This is how I'd fix it:

int compute_score(string word)
{
    // TODO: Compute and return score for string
    //Initialize score
    int score = 0;

    //Convert array of chars to uppercase and solve
    for (int i = 0, N = strlen(word); i < N; i++)
    {
        if(!isalpha(word[i]))
        {
            // not a valid scrabble word
            return 0;
        }
        score = score + POINTS[toupper(word[i]) - 65];
    }
    return score;
}

Upvotes: 1

Vlad from Moscow
Vlad from Moscow

Reputation: 311126

From the C Standard (7.4.1.2 The isalpha function)

2 The isalpha function tests for any character for which isupper or islower is true, or any character that is one of a locale-specific set of alphabetic characters for which none of iscntrl, isdigit, ispunct, or isspace is true.200) In the "C" locale, isalpha returns true only for the characters for which isupper or islower is true.

and (7.4.1.7 The islower function)

2 The islower function tests for any character that is a lowercase letter or is one of a locale-specific set of characters for which none of iscntrl, isdigit, ispunct, or isspace is true. In the "C" locale, islower returns true only for the lowercase letters (as defined in 5.2.1).

and (7.4.1.11 The isupper function)

2 The isupper function tests for any character that is an uppercase letter or is one of a locale-specific set of characters for which none of iscntrl, isdigit, ispunct, or isspace is true. In the "C" locale, isupper returns true only for the uppercase letters (as defined in 5.2.1).

So in this code snippet

for (int i = 0, N = strlen(word); i < N; i++)
{
    if (isupper(word[i]))
    {
        score += POINTS[word[i] - 'A'];
    }
    else if (islower(word[i]))
    {
        score += POINTS[word[i] - 'a'];
    }
}

there are processed only letters in the "C" locale.

In this code snippet

//Convert array of chars to uppercase and solve
for (int i = 0, N = strlen(word); i < N; i++)
{
    score = score + POINTS[toupper(word[i]) - 65];
}

there are processed all symbols.

You could write for example

//Convert array of chars to uppercase and solve
for (int i = 0, N = strlen(word); i < N; i++)
{
    if ( isalpha( ( unsigned char )word[i] ) )
    {
        score = score + POINTS[toupper(word[i]) - 'A'];
    }
}

Upvotes: 0

Related Questions