Reputation: 8335
I'm trying to skip the BOM in an UTF-8 encoded file, but my tests in ifs fail :
int i = 0;
if( str[i] == '\0xef' ) {
++i;
}
if( str[1] == '\0xbb' ) {
++i;
}
if( str[2] == '\0xbf' ) {
++i;
}
I don't know why they don't work. There must be some kind of implicit conversion between signed and unsigned, and Visual Studio displays character codes with 2 octets while debugging, even though i'm using 1 byte chars.
What's going on ? Thank you :)
Upvotes: 0
Views: 228
Reputation: 7482
The right way to define hex character constants is '\xef'
. Your str
shall stay signed in this case.
if( str[i] == '\xef' ) {
++i;
}
Or you can define str
as unsigned char *
and compare against integer 0xef (as proposed by Chinna).
Upvotes: 1
Reputation: 72062
The problem is that the constant is interpreted as an integer in this context, and thus gets the value 239, while the character is sign-extended. Because char
is signed on most x86 compilers (including Visual Studio) the bit-pattern 0xef
is interpreted as a negative number and thus, when extended, yields -17. Those two numbers are not equal.
Try doing it this way: if ((unsigned char)str[i] == '\xef')
.
Upvotes: 1
Reputation: 4002
Your code should be
int i = 0;
if( str[i] == 0xef ) {
++i;
}
if( str[1] == 0xbb ) {
++i;
}
if( str[2] == 0xbf ) {
++i;
}
Upvotes: 3