Donotalo
Donotalo

Reputation: 13025

Endian independency in C

In a microcontroller project written in C, we defined the following macros to access different bytes of a multi byte variable (4 byte long):

#define BYTE_0(var)         (*((unsigned char*) &var))
#define BYTE_1(var)         (*(((unsigned char*) &var) + 1))
#define BYTE_2(var)         (*(((unsigned char*) &var) + 2))

BYTE_0() accesses least significant byte, and so on. This is because we find that in case we need to access different bytes of a multi-byte variable separately (the micro in 8 bit), accessing the bytes using the code above produces fewer number of lines of code in assembly. As the code memory size is only 15K, few bytes are sometimes precious.

The micro we're using is little-endian. I'm wondering if we port the code to another micro which is big-endian architecture, will the code above work? In other words, does C standard guarantee that (*((unsigned char*) &var)) will give the least significant byte of var?

Upvotes: 2

Views: 758

Answers (5)

old_timer
old_timer

Reputation: 71566

No

#define BYTE_0(var)         (*((unsigned char*) &var))

will give you a byte related to the nature of the processor/memory controller not necessarily even a byte from the variable var. Ideally if var were 0x12345678 you would hope to see 0x12 on some systems and 0x78 on others.

#define BYTE_0(var)         (var&0xFF)

gives you the least significant byte of var for any system ASSUMING var is the same on every system.

to complete the list.

#define BYTE_0(var)         ((var>> 0)&0xFF)
#define BYTE_1(var)         ((var>> 8)&0xFF)
#define BYTE_2(var)         ((var>> 16)&0xFF)
#define BYTE_3(var)         ((var>> 24)&0xFF)

DO NOT use bitfields instead of shifting and masking, bitfields do not port from compiler to compiler, endians the same or different. bitfields are "implementation defined" and every possible messed up thing you can think of can be found out there in compilers.

be careful trying to reverse your question and build a variable from bytes:

var = (b3<<24)|(b2<<16)|(b1<<8)|(b0<<0);

if b3,b2,b1,b0 are defined as 8 bit variables the compiler does not have to promote them to 32 bit before shifting. On some systems/compilers the above code would produce the desired effect of placing four bytes into a 32 bit variable. But on other systems var = b0 is what the above code is saying to do because b1 is an 8 bit variable shift it left 8 and you are left with zeros, likewise shift b2 16 and b3 24 and you end up with

var = 0 | 0 | 0 | b0;

I prefer

var = 0;
var <<= 8; var |= b3;
var <<= 8; var |= b2;
var <<= 8; var |= b1;
var <<= 8; var |= b0;

or

var = b3;
var <<= 8; var |= b2;
var <<= 8; var |= b1;
var <<= 8; var |= b0;

which port quite nicely. And the optimizer should give you similar/same code as a single line of C with typedefs or a bitfield solution.

Upvotes: 2

JeremyP
JeremyP

Reputation: 86661

If you want a portable way of accessing the bytes of a 4 byte integer, you either use a #define to change the macros depending on architecture,

#ifdef LITTLE_ENDIAN
#define OFFSET_0 0
#define OFFSET_1 1
#define OFFSET_2 2
#else
#define OFFSET_0 3
#define OFFSET_1 2
#define OFFSET_2 1
#endif



#define BYTE_0(var)         (*(((unsigned char*) &var) + OFFSET_0))
#define BYTE_1(var)         (*(((unsigned char*) &var) + OFFSET_1))
#define BYTE_2(var)         (*(((unsigned char*) &var) + OFFSET_2))

or, and this is preferable IMO, if your architecture supports shifts of more than 1 bit.

#define BYTE_0(var)         ((var) & 0xFF)
#define BYTE_1(var)         (((var) >> 1 * CHAR_BIT) & 0xFF)
#define BYTE_2(var)         (((var) >> 2 * CHAR_BIT) & 0xFF)

which works even for non lvars (e.g. the results of expressions, literals) and is portable.

Upvotes: 0

Lundin
Lundin

Reputation: 214385

Your macro does not work, it assumes little endian architecture. The C standard guarantees nothing in the case of your code. Endian-independent code is typically written with bit-wise operators, because they behave the same way no matter where the ls byte is allocated.

some_long & 0xFF is guaranteed by the C standard to give you the ls byte no matter endianess, while (uint8_t*)&some_long is endian-dependent.

This link answers your question in detail: http://www.ibm.com/developerworks/aix/library/au-endianc/ Do a macro similar to the one in listing 12 with bitwise shift and bitwise AND, and it will be portable.

Upvotes: 4

Matt Joiner
Matt Joiner

Reputation: 118600

No it won't. You've struck on the reason why little endian is better to work with than big endian.

Also consider that casting to different integer sizes does not require any pointer arithmetic.

It's also wrong to assume that long will always be 4 bytes. Unfortunately the going standard for x64 for example is LP64, that is int was left behind as the 4 byte integer.

Upvotes: 1

No, this is what endianness means. Your code won't work on machines with opposite endianness.

And I am not even entirely sure that doing such manual optimization matters. Perhaps a better compiler would optimize better...

Upvotes: 3

Related Questions