Dmitriy
Dmitriy

Reputation: 5497

Why doesn't switching on a char value reach case 0xC2?

Found a bug the code below, it parses C-string and is supposed to detect UTF8 characters:

char* pTmp = ...;
...
switch (*pTmp)
{
    case 'o':
    {
        ...     // works fine
        break;
    }   
    case 0xC2:
    {
        ...     // never gets triggered
        break;
    }
}

However case 0xC2: is never triggered.

My assumption is that 0xC2 is considered an int and therefore is 194 which is bigger than 127, the maximum value for char data type. So -62 != 194

Or may be there is some overflow or integer promotion is happening here.

Writing switch ((unsigned char)*pTmp) fixes the issue.

But I would like to clarify what is really going on here and what rules are applied.

I'm also open to changing the title, just nothing better came up in mind.

Upvotes: 2

Views: 326

Answers (2)

chux
chux

Reputation: 154175

Is char signed?

It is implementation specific if char has the same range as signed char or unsinged char. In OP's case, char has the range [-128 ... 127], thus case 0xC2: is never matched.


But I would like to clarify what is really going on here and what rules are applied.

The C standard library string functions have many parameters that are char *, yet those library functions internally act as if they are a pointing to unsigned char data

For all functions in this subclause, each character shall be interpreted as if it had the type unsigned char (and therefore every possible object representation is valid and has a different value). C17dr § 7.24.1 3

To match that, OP's code should do likewise. Doing so will also allow *upTmp to potentially match 0xC2.

char* pTmp = ...;
unsigned char* upTmp = ( unsigned char*) pTmp;

switch (*upTmp)

   case 0xC2:

Alterative to using hexadecimal constant 0xC2, use a character constant: '\xC2' to match the range of char. @Eric Postpischil.


[Pedantic]

"switch ((unsigned char)*pTmp) fixes the issue." - it is close enough.

This "fix" works with 2's complements signed char as well as when the implementation defined char matches unsigned char.

For the remaining all but non-existent cases where char is signed and not 2's complement, fix is wrong as the characters should be accessed via unsigned char *, else the wrong value is used.

switch (*(unsigned char *)pTmp) works correctly in all cases.

Upvotes: 3

Zoso
Zoso

Reputation: 3465

I get this on adding -Wall here

warning: case label value exceeds maximum value for type [-Wswitch-outside-range]
   14 |             case 0xC2:

So yes, your reasoning is correct.

Upvotes: 4

Related Questions