kev
kev

Reputation: 161954

What makes a system little-endian or big-endian?

I'm confused with the byte order of a system/cpu/program.
So I must ask some questions to make my mind clear.

Question 1

If I only use type char in my C++ program:

void main()
{
    char c = 'A';
    char* s = "XYZ";    
}

Then compile this program to a executable binary file called a.out.
Can a.out both run on little-endian and big-endian systems?

Question 2

If my Windows XP system is little-endian, can I install a big-endian Linux system in VMWare/VirtualBox? What makes a system little-endian or big-endian?

Question 3

If I want to write a byte-order-independent C++ program, what do I need to take into account?

Upvotes: 13

Views: 15301

Answers (6)

Bakaiya
Bakaiya

Reputation: 43

1: The output of the compiler will depend on the options you give it and if you use a cross-compiler. By default, it should run on the operating system you are compiling it on and not others (perhaps not even others of the same type; not all Linux binaries run on all Linux installs, for example). In large projects, this will be the least of your concern, as libraries, etc, will need built and linked differently on each system. Using a proper build system (like make) will take care of most of this without you needing to worry.

2: Virtual machines abstract the hardware in such a way as to allow essentially anything to run within anything else. How the operating systems manage their memory is unimportant as long as they both run on the same hardware and support whatever virtualization model is in use. Endianness means the byte-order; if it is read left-right or right-left (or some other format). Some hardware supports both and virtualization allows both to coexist in that case (although I am not aware of how this would be useful except that it is possible in theory). However, Linux works on many different architectures (and Windows some other than Ixxx), so the situation is more complicated.

3: If you monkey with raw memory, such as with binary operators, you might put yourself in a position of depending on endianness. However, most modern programming is at a higher level than this. As such, you are likely to notice if you get into something which may impose endianness-based limitations. If such is ever required, you can always implement options for both endiannesses using the preprocessor.

Upvotes: 2

Mysticial
Mysticial

Reputation: 471569

Question 1:

Can a.out both run on little-endian and big-endian system?

No. Because a.out is already compiled for whatever architecture it is targeting. It will not run on another architecture that it is incompatible with.

However, the source code for that simple program has nothing that could possibly break on different endian machines.

So yes it (the source) will work properly. (well... aside from void main(), which you should be using int main() instead)

Question 2:

If my WindowsXP system is little-endian, can I install a big-endian Linux system in VMWare/VirtualBox?

Endian-ness is determined by the hardware, not the OS. So whatever (native) VM you install on it, will be the same endian as the host. (since x86 is all little-endian)

What makes a system little-endian or big-endian?

Here's an example of something that will behave differently on little vs. big-endian:

uint64_t a = 0x0123456789abcdefull;
uint32_t b = *(uint32_t*)&a;
printf("b is %x",b)

*Note that this violates strict-aliasing, and is only for demonstration purposes.

Little Endian : b is 89abcdef
Big Endian    : b is 1234567

On little-endian, the lower bits of a are stored at the lowest address. So when you access a as a 32-bit integer, you will read the lower 32 bits of it. On big-endian, you will read the upper 32 bits.

Question 3:

If I want to write a byte-order independent C++ program, what do I need to take into account?

Just follow the standard C++ rules and don't do anything ugly like the example I've shown above. Avoid undefined behavior, avoid type-punning...

Upvotes: 8

Nicol Bolas
Nicol Bolas

Reputation: 474386

Can a.out both run on little-endian and big-endian system?

No, because pretty much any two CPUs that are so different as to have different endian-ness will not run the same instruction set. C++ isn't Java; you don't compile to something that gets compiled or interpreted. You compile to the assembly for a specific CPU. And endian-ness is part of the CPU.

But that's outside of endian issues. You can compile that program for different CPUs and those executables will work fine on their respective CPUs.

What makes a system little-endian or big-endian?

As far as C or C++ is concerned, the CPU. Different processing units in a computer can actually have different endians (the GPU could be big-endian while the CPU is little endian), but that's somewhat uncommon.

If I want to write a byte-order independent C++ program, what do I need to take into account?

As long as you play by the rules of C or C++, you don't have to care about endian issues.

Of course, you also won't be able to load files directly into POD structs. Or read a series of bytes, pretend it is a series of unsigned shorts, and then process it as a UTF-16-encoded string. All of those things step into the realm of implementation-defined behavior.

There's a difference between "undefined" and "implementation-defined" behavior. When the C and C++ spec say something is "undefined", it basically means all manner of brokenness can ensue. If you keep doing it, (and your program doesn't crash) you could get inconsistent results. When it says that something is defined by the implementation, you will get consistent results for that implementation.

If you compile for x86 in VC2010, what happens when you pretend a byte array is an unsigned short array (ie: unsigned char *byteArray = ...; unsigned short *usArray = (unsigned short*)byteArray) is defined by the implementation. When compiling for big-endian CPUs, you'll get a different answer than when compiling for little-endian CPUs.

In general, endian issues are things you can localize to input/output systems. Networking, file reading, etc. They should be taken care of in the extremities of your codebase.

Upvotes: 21

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 727047

Little-endian / big-endian is a property of hardware. In general, binary code compiled for one hardware cannot run on another hardware, except in a virtualization environments that interpret machine code, and emulate the target hardware for it. There are bi-endian CPUs (e.g. ARM, IA-64) that feature a switch to change endianness.

As far as byte-order-independent programming goes, the only case when you really need to do it is to deal with networking. There are functions such as ntohl and htonl to help you converting your hardware's byte order to network's byte order.

Upvotes: 3

N_A
N_A

Reputation: 19897

The first thing to clarify is that endianness is a hardware attribute, not a software/OS attribute, so WinXP and Linux are not big-endian or little endian, but rather the hardware on which they run is either big-endian or little endian.

Endianness is a description of the order in which the bytes are stored in a data-type. A system that is big-endian stores the most significant (read biggest value) value first and a little-endian system stores the least significant byte first. It is not mandatory to have each datatype be the same as the others on a system so you can have mixed-endian systems.

A program that is little endian would not run on a big-endian system, but that has more to with the instruction set available than the endianness of the system on which it was compiled.

If you want to write a byte-order independent program you simply need to not depend on the byte order of your data.

Upvotes: 2

Gigi
Gigi

Reputation: 4962

The endianness of a system determine how the bytes are interpreted, so what bit is considered the "first" and what is considered the "last".

You need to care about it only when loading or saving from some sources external to your program, like disk or networks.

Upvotes: 1

Related Questions