I recently saw this post about endianness macros in C and I can't really wrap my head around the first answer. Code supporting arbitrary byte orders, ready to be put into a file called order32.h: #ifndef ORDER32_H #define ORDER32_H #include <limits.h> #include <stdint.h> #if CHAR_BIT != 8 #error "unsupported char size" #endif enum { O32_LITTLE_ENDIAN = 0x03020100ul, O32_BIG_ENDIAN = 0x00010203ul, O32_PDP_ENDIAN = 0x01000302ul }; static const union { unsigned char bytes[4]; uint32_t value; } o32_host_order = { { 0, 1, 2, 3 } }; #define O32_HOST_ORDER (o32_host_order.value) #endif You would check for little endian systems via O32_HOST_ORDER == O32_LITTLE_ENDIAN I do understand endianness in general. This is how I understand the code: Create example of little, middle and big endianness. Compare test case to examples of little, middle and big endianness and decide what type the host machine is of. What I don't understand are the following aspects: Why is an union needed to store the test-case ? Isn't uint32_t guaranteed to be able to hold 32 bits/4 bytes as needed? And what does the assignment { { 0, 1, 2, 3 } } mean? It assigns the value to the union, but why the strange markup with two braces? Why the check for CHAR_BIT ? One comment mentions that it would be more useful to check UINT8_MAX ? Why is char even used here, when it's not guaranteed to be 8 bits wide? Why not just use uint8_t ? I found this link to Google-Devs github. They don't rely on this check... Could someone please elaborate?

Reputation: 744

Endianness macro in C

I recently saw this post about endianness macros in C and I can't really wrap my head around the first answer.

Code supporting arbitrary byte orders, ready to be put into a file called order32.h:

#ifndef ORDER32_H
#define ORDER32_H

#include <limits.h>
#include <stdint.h>

#if CHAR_BIT != 8
#error "unsupported char size"
#endif

enum
{
    O32_LITTLE_ENDIAN = 0x03020100ul,
    O32_BIG_ENDIAN = 0x00010203ul,
    O32_PDP_ENDIAN = 0x01000302ul
};

static const union { unsigned char bytes[4]; uint32_t value; } o32_host_order =
    { { 0, 1, 2, 3 } };

#define O32_HOST_ORDER (o32_host_order.value)

#endif

You would check for little endian systems via

O32_HOST_ORDER == O32_LITTLE_ENDIAN

I do understand endianness in general. This is how I understand the code:

Create example of little, middle and big endianness.
Compare test case to examples of little, middle and big endianness and decide what type the host machine is of.

What I don't understand are the following aspects:

Why is an union needed to store the test-case? Isn't uint32_t guaranteed to be able to hold 32 bits/4 bytes as needed? And what does the assignment { { 0, 1, 2, 3 } } mean? It assigns the value to the union, but why the strange markup with two braces?
Why the check for CHAR_BIT? One comment mentions that it would be more useful to check UINT8_MAX? Why is char even used here, when it's not guaranteed to be 8 bits wide? Why not just use uint8_t? I found this link to Google-Devs github. They don't rely on this check... Could someone please elaborate?

Upvotes: 3

Answers (3)

DigitalRoss

Reputation: 146053

Why is a union needed to store the test case?

The entire point of the test is to alias the array with the magic value the array will create.

Isn't uint32_t guaranteed to be able to hold 32 bits/4 bytes as needed?

Well, more-or-less. It will but other than 32 bits there are no guarantees. It would fail only on some really fringe architecture you will never encounter.

And what does the assignment { { 0, 1, 2, 3 } } mean? It assigns the value to the union, but why the strange markup with two braces?

The inner brace is for the array.

Why the check for CHAR_BIT?

Because that's the actual guarantee. If that doesn't blow up, everything will work.

One comment mentions that it would be more useful to check UINT8_MAX? Why is char even used here, when it's not guaranteed to be 8 bits wide?

Because in fact it always is, these days.

Why not just use uint8_t? I found this link to Google-Devs github. They don't rely on this check... Could someone please elaborate?

Lots of other choices would work also.

Upvotes: 3

viraptor

Reputation: 34145

{{0, 1, 2, 3}} is the initializer for the union, which will result in bytes component being filled with [0, 1, 2, 3].

Now, since the bytes array and the uint32_t occupy the same space, you can read the same value as a native 32-bit integer. The value of that integer shows you how the array was shuffled - which really means which endian system are you using.

There are only 3 popular possibilities here - O32_LITTLE_ENDIAN, O32_BIG_ENDIAN, and O32_PDP_ENDIAN.

As for the char / uint8_t - I don't know. I think it makes more sense to just use uint_8 with no checks.

Upvotes: 2

dbush

Reputation: 223689

The initialization has two set of braces because the inner braces initialize the bytes array. So byte[0] is 0, byte[1] is 1, etc.

The union allows a uint32_t to lie on the same bytes as the char array and be interpreted in whatever the machine's endianness is. So if the machine is little endian, 0 is in the low order byte and 3 is in the high order byte of value. Conversely, if the machine is big endian, 0 is in the high order byte and 3 is in the low order byte of value.

Upvotes: 2

Endianness macro in C

Answers (3)

Related Questions