user1889966
user1889966

Reputation: 165

toupper function

I am wondering how the toupper() function in C works. I am trying it out in the code below but I'm definitely doing something wrong. The code compiles, but the arguments passed into toupper() are not being capitalized...

char **copyArgs(int argc, char **argv) {
    char **a = malloc(sizeof(char *) * (argc));

    int i;
    for(i = 0; i < argc; i++) {
        int size = strlen(argv[i]);
        a[i] = malloc(sizeof(char) * (size + 1));
        strcpy(a[i], argv[i]);
        a[i] = toupper(a[i]);
    }
    return a;
}

If I test this with "one two" it results in "one two", not "ONE TWO". Any advice is appreciated.

Upvotes: 0

Views: 2924

Answers (1)

Matteo Italia
Matteo Italia

Reputation: 126867

toupper converts a single letter to uppercase. In your case, you are passing a pointer to it instead of a char thanks to C's forgiveness in implicit conversions, so it's obvious that it doesn't work correctly. Probably you are getting an "implicit pointer to integer conversion without a cast" warning: this is a strong sign that you are doing something very wrong.

The whole thing doesn't blow up just because on your platform int is as big as a pointer (or, at least, big enough for those pointers you are using); toupper tries to interpret that int as a character, finds out that it's non-alphabetic and returns it unmodified. That's sheer luck, on other platforms your program would probably crash, because of truncation in the pointer to int conversion, and because the behavior of toupper on integers outside the unsigned char range (plus EOF) is undefined.

To convert a whole string to uppercase, you have to iterate over all its chars and call toupper on each of them. You can easily write a function that does this:

void strtoupper(char *str)
{
    while(toupper((unsigned char)*str++))
        ;
}

Notice the unsigned char cast - all C functions dealing with character categorization and conversion require an int that is either EOF (which is left intact) or is the value of an unsigned char. The reason is sad and complex, and I already detailed it in another answer.

Still, it's worth noting that toupper by design cannot work reliably with multibyte character encodings (such as UTF-8), so it has no real place in modern text processing (as in general most of the C locale facilities, which were (badly) designed in another era).

Upvotes: 5

Related Questions