user2793162
user2793162

Reputation:

confusion on little endian big endian

I have some confusion on little endian/big endian. Seems I am missing smth simple. Some feedback appreciated. For example, say we have two functions which retrieve least and most significant bytes of
32bit value:

#define LSB(x) ((x) & 0x000000FF)

#define MSB(x) ((x) & 0xFF000000)

My question is: do above two functions return correct result both on big endian and little endian machines?

Now I will explain why I have the confusion. Imagine we are on a little endian machine. On a little endian machine integer 9 is stored in memory like this (in hex): 09 00 00 00 (least significant byte first) Now at some point, you might think, if we use above LSB function, then we would end up with such expression: 09 00 00 00 & 00 00 00 FF which is 0 - but of course that's not how above LSB function will work eventually. So it seems I am missing smth. Any help appreciated.

Also if I say int y = 0x000000FF - this is 255 regardless of the endiannes of the machine right?

Upvotes: 7

Views: 1886

Answers (6)

Shahbaz
Shahbaz

Reputation: 47563

Regardless of endianness, x & 0xFF will give you the least significant byte.

First of all, you should understand the difference between endianness and significance. Endianness means in what order the bytes are written to memory; it's completely irrelevant to any computation in the CPU. Significance says which bits have a higher value; it's completely irrelevant to any system of storage.

Once you load a value from memory into CPU, it's endianness doesn't matter, since to the CPU (more accurately, ALU) all that matters is the significance of the bits.

So, as far as C is concerned, 0x000000FF has 1s in its least significant byte and anding it with a variable would give its least significant byte.


In fact, in the whole C standard, you can't find the word "endian". C defines an "abstract machine" where only the significance of the bits matter. It's the responsibility of the compiler to compile the program in such a way that it behaves the same as the abstract machine, regardless of endianness. So unless you are expecting a certain layout of memory (for example through a union or a cast of pointers), you don't need to think about endianness at all.


Another example that might interest you is shifting. The same thing applies to shifting. In fact, like I said before, endianness doesn't matter to the ALU, so << always translates to shift towards more significant bits by not even the compiler, but the CPU itself, regardless of endianness.


Let me put these in a graph with two orthogonal directions so maybe you understand it better. This is how a load operation looks like from the CPU's point of view.

On a little-endian machine you have:

         MEMORY            CPU Register

  LSB BYTE2 BYTE3 MSB  ---->   MSB
    \    \     \----------->  BYTE3
     \    \---------------->  BYTE2
      \-------------------->   LSB

On a big endian machine you have:

         MEMORY            CPU Register

      /-------------------->   MSB
     /    /---------------->  BYTE3
    /    /     /----------->  BYTE2
  MSB BYTE3 BYTE2 LSB  ---->   LSB

As you can see, in both cases, you have:

CPU Register

    MSB
   BYTE3
   BYTE2
    LSB

which means in both cases, the CPU ended up loading the exact same value.

Upvotes: 11

Shawn Nelson
Shawn Nelson

Reputation: 11

Endian is about how memory is used. You primarily have to worry about it when serializing or deserializing bytes to memory, storage or a stream of some kind.

I believe your macros will sometimes work and sometimes not work as expected depending on how you use them. If x is an int (assuming you are using 32 bit ints) then you should be fine since the compiler knows what an int is and how it is represented when x is not a 32bit number you could run into problems.

Upvotes: 0

ouah
ouah

Reputation: 145899

My question is: do above two functions return correct result both on big endian and little endian machines?

Yes, they do. The problem comes when you want to form a scalar from a multi-byte array which is not what you are doing.

Upvotes: 1

Jerry Coffin
Jerry Coffin

Reputation: 490338

Yes, these work correctly regardless of endianess.

Both the number you use as the mask and the number you give these as input have the same endianess, so they give the same result either way.

Endianess becomes an issue primarily when you have (for example) an integer you've received over a network connection as an array of chars. In such a case, you have to put those chars back together in the right order to get the original value.

Upvotes: 1

NPE
NPE

Reputation: 500673

As long as you treat the integer value as a single entity and not as a sequence of raw bytes (in memory, on the wire etc), the issue of endianness will not feature in your code.

Thus, 0x000000FF is always 255 and your LSB and MSB macros are correct.

Upvotes: 0

Todd Li
Todd Li

Reputation: 3269

0x000000FF is always 255, regardless of endianness. It is stored as FF 00 00 00 on little endian machines, so LSB(9) will continue to work.

Upvotes: 3

Related Questions