Destructor
Destructor

Reputation: 14438

How this custom toupper() function works?

I've seen following program that uses custom toupper() function.

#include <stdio.h> 
void my_toUpper(char* str, int index)
{
    *(str + index) &= ~32;
}
int main()
{
    char arr[] = "geeksquiz";
    my_toUpper(arr, 0);
    my_toUpper(arr, 5);
    printf("%s", arr);
    return 0;
}

How this function works exactly? I can't understand logic behind it. It will be good If someone explains it easily.

Upvotes: 0

Views: 1505

Answers (2)

Steve Summit
Steve Summit

Reputation: 48020

To understand this, we have to look at the ASCII representations of letters. It's easiest to do this in base 2.

A  01000001        a  01100001
B  01000010        b  01100010
C  01000011        c  01100011
D  01000100        d  01100100
   ...                ...
X  01011000        x  01111000
Y  01011001        y  01111001
Z  01011010        z  01111010

Notice that the upper-case letters all begin with 010, and the lower-case letters all begin with 011. Notice that the lower-order bits are all the same for the upper- and lower-case versions of the same letter.

So: all we need to do to convert a lower-case letter to the corresponding upper-case letter is to change the 011 to 010, or in other words, turn off the 00100000 bit.

Now, the standard way to turn off a bit is to do a bitwise AND of a mask with a 0 in the position of the bit you want to turn off, and 1's everywhere else. So the mask we want is 11011111. We could write that as 0xdf, but the programmer in this example has chosen to emphasize that it's a complementary mask to 00100000 by writing ~32. 32 in binary is 00100000.

This technique works fine, except that it will do strange things with non-letters. For example, it will turn '{' into '[' (because they have the ASCII codes 01111011 and 001011011, respectively). It will turn an asterisk '*' into a newline '\n' (00101010 into 00001010).

The other way of converting upper to lower case in ASCII is to subtract 32. That, also, will convert 'a' to 'A' (97 to 65, in decimal), but if would also convert, for example, 'A' to '!'. The bitwise AND technique is actually advantageous in this case because it converts 'A' to 'A' (which is what a convert-to-uppercase routine ought to do).

The bottom line is that whether you AND with ~32 or subtract 32, in a properly safe function you're going to have to also check that the character being converted is the right kind of letter to begin with.

Also, it's worth noting that this technique absolutely assumes the 7-bit ASCII character set, and will not work with accented or non-Roman letters of other character sets, such as ISO-8859 or Unicode. (EBCDIC would be another matter.)

Upvotes: 2

Sourav Ghosh
Sourav Ghosh

Reputation: 134356

Following the ASCII table, to convert a letter from lowercase to UPPERCASE, you need to subtract 32 from the ASCII value of the lowercase letter.

For the ASCII values representing the lowercase letters, subtracting 32, is equal to ANDing ~32. That is what being done in

 *(str + index) &= ~32;

It takes the value of the indexth member from the str, subtract 32 (bitwise AND with ~32, clears the particular bit value) and stores it back to the same index.

FWIW, this is a special case of "resetting" a particular bit to get the result of actually subtracting 32. This "subtraction" works here based on the particular bit representation of the lowercase letter ASCII values. As mentioned in the comments, this is not a general way of subtraction, as this "resetting" logic won't work on any value for subtraction.

Regarding the operators used,

  • &= is assignment by bitwise AND
  • ~ is bitwise NOT.

Note: This custom function lacks the error check for the (in)valid value present in str. You need to take care of that.

Upvotes: 3

Related Questions