Reputation: 5497
Found a bug the code below, it parses C-string and is supposed to detect UTF8 characters:
char* pTmp = ...;
...
switch (*pTmp)
{
case 'o':
{
... // works fine
break;
}
case 0xC2:
{
... // never gets triggered
break;
}
}
However case 0xC2:
is never triggered.
My assumption is that 0xC2
is considered an int
and therefore is 194 which is bigger than 127, the maximum value for char
data type. So -62 != 194
Or may be there is some overflow or integer promotion is happening here.
Writing switch ((unsigned char)*pTmp)
fixes the issue.
But I would like to clarify what is really going on here and what rules are applied.
I'm also open to changing the title, just nothing better came up in mind.
Upvotes: 2
Views: 326
Reputation: 154175
Is
char
signed?
It is implementation specific if char
has the same range as signed char
or unsinged char
. In OP's case, char
has the range [-128 ... 127], thus case 0xC2:
is never matched.
But I would like to clarify what is really going on here and what rules are applied.
The C standard library string functions have many parameters that are char *
, yet those library functions internally act as if they are a pointing to unsigned char
data
For all functions in this subclause, each character shall be interpreted as if it had the type unsigned char (and therefore every possible object representation is valid and has a different value). C17dr § 7.24.1 3
To match that, OP's code should do likewise. Doing so will also allow *upTmp
to potentially match 0xC2
.
char* pTmp = ...;
unsigned char* upTmp = ( unsigned char*) pTmp;
switch (*upTmp)
case 0xC2:
Alterative to using hexadecimal constant 0xC2
, use a character constant: '\xC2'
to match the range of char
. @Eric Postpischil.
[Pedantic]
"switch ((unsigned char)*pTmp)
fixes the issue." - it is close enough.
This "fix" works with 2's complements signed char
as well as when the implementation defined char
matches unsigned char
.
For the remaining all but non-existent cases where char
is signed and not 2's complement, fix is wrong as the characters should be accessed via unsigned char *
, else the wrong value is used.
switch (*(unsigned char *)pTmp)
works correctly in all cases.
Upvotes: 3