Reputation: 167
I wrote single cyrillic character А
into file. It's hex representation in UTF-8 encoding is 0xD090
. Then I read file's content, but for some reason result was different.
Here's function I'm using to print binary representation of number:
char* bstr32(int val) {
char* res = calloc(33, 1);
for (int a = 31; a >= 0; a--) {
res[a] = val & 1 ? '1' : '0';
val >>= 1;
}
return res;
}
That's how I read data from file (size of file is hardcoded and error checks are omitted):
int main() {
unsigned char buff[2];
read(open("data", O_RDONLY), buff, 2);
puts(bstr32(*((unsigned short*) buff)))
}
Output: 00000000000000001001000011010000
Same value but defined in code:
int main() {
puts(bstr32(0xD090))
}
Output: 00000000000000001101000010010000
I figured out that data read from file is in little endian format, using htonl
will produce correct result. I wonder why does result is different despite bitwise operators are endian independent?
Upvotes: 1
Views: 178
Reputation: 222302
In puts(bstr32(*((unsigned short*) buff)))
, you take two bytes in memory and load them as a short
, which, in your C implementation, puts the low-addressed byte in the high bits of the short
, and then that short
is displayed with its high bits first.
So the bytes are in the file the same way they were originally in memory, but your attempt to display them using a short
presented them in reverse order.
Upvotes: 2