Virus721
Virus721

Reputation: 8335

C / C++ - signed char comparing

I'm trying to skip the BOM in an UTF-8 encoded file, but my tests in ifs fail :

int i = 0;

if( str[i] == '\0xef' ) {
    ++i;
}

if( str[1] == '\0xbb' ) {
    ++i;
}

if( str[2] == '\0xbf' ) {
    ++i;
}

I don't know why they don't work. There must be some kind of implicit conversion between signed and unsigned, and Visual Studio displays character codes with 2 octets while debugging, even though i'm using 1 byte chars.

What's going on ? Thank you :)

Upvotes: 0

Views: 228

Answers (3)

Marian
Marian

Reputation: 7482

The right way to define hex character constants is '\xef'. Your str shall stay signed in this case.

if( str[i] == '\xef' ) {
    ++i;
}

Or you can define str as unsigned char * and compare against integer 0xef (as proposed by Chinna).

Upvotes: 1

Sebastian Redl
Sebastian Redl

Reputation: 72062

The problem is that the constant is interpreted as an integer in this context, and thus gets the value 239, while the character is sign-extended. Because char is signed on most x86 compilers (including Visual Studio) the bit-pattern 0xef is interpreted as a negative number and thus, when extended, yields -17. Those two numbers are not equal.

Try doing it this way: if ((unsigned char)str[i] == '\xef').

Upvotes: 1

Chinna
Chinna

Reputation: 4002

Your code should be

int i = 0;

if( str[i] == 0xef ) {
    ++i;
}

if( str[1] == 0xbb ) {
    ++i;
}

if( str[2] == 0xbf ) {
    ++i;
}

Upvotes: 3

Related Questions