zexed640
zexed640

Reputation: 167

Byte order of file content

I wrote single cyrillic character А into file. It's hex representation in UTF-8 encoding is 0xD090. Then I read file's content, but for some reason result was different.

Here's function I'm using to print binary representation of number:

char* bstr32(int val) {
    char* res = calloc(33, 1);

    for (int a = 31; a >= 0; a--) {
        res[a] = val & 1 ? '1' : '0';
        val >>= 1;
    }

    return res;
}

That's how I read data from file (size of file is hardcoded and error checks are omitted):

int main() {
    unsigned char buff[2];

    read(open("data", O_RDONLY), buff, 2);

    puts(bstr32(*((unsigned short*) buff)))
}

Output: 00000000000000001001000011010000

Same value but defined in code:

int main() {
    puts(bstr32(0xD090))
}

Output: 00000000000000001101000010010000

I figured out that data read from file is in little endian format, using htonl will produce correct result. I wonder why does result is different despite bitwise operators are endian independent?

Upvotes: 1

Views: 178

Answers (1)

Eric Postpischil
Eric Postpischil

Reputation: 222302

In puts(bstr32(*((unsigned short*) buff))), you take two bytes in memory and load them as a short, which, in your C implementation, puts the low-addressed byte in the high bits of the short, and then that short is displayed with its high bits first.

So the bytes are in the file the same way they were originally in memory, but your attempt to display them using a short presented them in reverse order.

Upvotes: 2

Related Questions