Reputation:
I have some confusion on little endian/big endian. Seems I am missing
smth simple. Some feedback appreciated.
For example, say we have two functions which retrieve least and most significant bytes of
32bit value:
#define LSB(x) ((x) & 0x000000FF)
#define MSB(x) ((x) & 0xFF000000)
My question is: do above two functions return correct result both on big endian and little endian machines?
Now I will explain why I have the confusion. Imagine we are on a little endian machine. On a little endian machine integer 9 is stored in memory like this (in hex): 09 00 00 00 (least significant byte first) Now at some point, you might think, if we use above LSB function, then we would end up with such expression: 09 00 00 00 & 00 00 00 FF which is 0 - but of course that's not how above LSB function will work eventually. So it seems I am missing smth. Any help appreciated.
Also if I say int y = 0x000000FF
- this is 255 regardless of the endiannes of the machine right?
Upvotes: 7
Views: 1886
Reputation: 47563
Regardless of endianness, x & 0xFF
will give you the least significant byte.
First of all, you should understand the difference between endianness and significance. Endianness means in what order the bytes are written to memory; it's completely irrelevant to any computation in the CPU. Significance says which bits have a higher value; it's completely irrelevant to any system of storage.
Once you load a value from memory into CPU, it's endianness doesn't matter, since to the CPU (more accurately, ALU) all that matters is the significance of the bits.
So, as far as C is concerned, 0x000000FF
has 1s in its least significant byte and and
ing it with a variable would give its least significant byte.
In fact, in the whole C standard, you can't find the word "endian". C defines an "abstract machine" where only the significance of the bits matter. It's the responsibility of the compiler to compile the program in such a way that it behaves the same as the abstract machine, regardless of endianness. So unless you are expecting a certain layout of memory (for example through a union
or a cast of pointers), you don't need to think about endianness at all.
Another example that might interest you is shifting. The same thing applies to shifting. In fact, like I said before, endianness doesn't matter to the ALU, so <<
always translates to shift towards more significant bits by not even the compiler, but the CPU itself, regardless of endianness.
Let me put these in a graph with two orthogonal directions so maybe you understand it better. This is how a load operation looks like from the CPU's point of view.
On a little-endian machine you have:
MEMORY CPU Register
LSB BYTE2 BYTE3 MSB ----> MSB
\ \ \-----------> BYTE3
\ \----------------> BYTE2
\--------------------> LSB
On a big endian machine you have:
MEMORY CPU Register
/--------------------> MSB
/ /----------------> BYTE3
/ / /-----------> BYTE2
MSB BYTE3 BYTE2 LSB ----> LSB
As you can see, in both cases, you have:
CPU Register
MSB
BYTE3
BYTE2
LSB
which means in both cases, the CPU ended up loading the exact same value.
Upvotes: 11
Reputation: 11
Endian is about how memory is used. You primarily have to worry about it when serializing or deserializing bytes to memory, storage or a stream of some kind.
I believe your macros will sometimes work and sometimes not work as expected depending on how you use them. If x is an int (assuming you are using 32 bit ints) then you should be fine since the compiler knows what an int is and how it is represented when x is not a 32bit number you could run into problems.
Upvotes: 0
Reputation: 145899
My question is: do above two functions return correct result both on big endian and little endian machines?
Yes, they do. The problem comes when you want to form a scalar from a multi-byte array which is not what you are doing.
Upvotes: 1
Reputation: 490338
Yes, these work correctly regardless of endianess.
Both the number you use as the mask and the number you give these as input have the same endianess, so they give the same result either way.
Endianess becomes an issue primarily when you have (for example) an integer you've received over a network connection as an array of char
s. In such a case, you have to put those char
s back together in the right order to get the original value.
Upvotes: 1
Reputation: 500673
As long as you treat the integer value as a single entity and not as a sequence of raw bytes (in memory, on the wire etc), the issue of endianness will not feature in your code.
Thus, 0x000000FF
is always 255 and your LSB
and MSB
macros are correct.
Upvotes: 0
Reputation: 3269
0x000000FF
is always 255, regardless of endianness. It is stored as FF 00 00 00
on little endian machines, so LSB(9)
will continue to work.
Upvotes: 3