Dan
Dan

Reputation: 2886

Moving binary files between big endian and little endian platforms

I know that network byte order is big endian regardless of host endianness.

My question is what happens when a binary file is moved from a BE to LE platform. From this post I can see the data byte order on disk is the same as memory byte order of the platform: https://stackoverflow.com/a/5751824

So assume I have a small binary file on a LE machine with this content in LE order 2 5: (Note I have turned this into an actual binary file with xxd)

00000010 00000101

Didn't the file had to be reversed to 00000101 00000010 before transmission? If so would't the BE machine store it as is in network order after it has received it? How is the file content not reversed after download?

Upvotes: 0

Views: 621

Answers (2)

Erik Eidt
Erik Eidt

Reputation: 26656

My question is what happens when a binary file is moved from a BE to LE platform.

The file bytes should be copied without being swapped at all — the bytes should appear in the same order on all machines, as elaborated byte by byte.

For simple text it doesn't matter, since sequences of individual bytes are not endian in nature.

For most anything else, there will be a defined file format, which specifies the endianness of numeric fields larger than 8 bits.  This is what you have already observed about TCP/IP, which defines big endian for the headers.

JPG, PNG, others either avoid multi-byte numerics or define how the bytes in the file are interpreted when multi-byte numeric values are employed.

Certain data formats will use a Byte Order Mark BOM, which is part of a flexible format that allows the writer to choose endianness (so can choose the one that is natural for the writing system, if desired), and, this allows reader to determine the endianness of the file.

For multi-byte text, Unicode uses some of the above features, but the more modern encoding, UTF-8, is supposed to be interpreted as "simple" sequence of bytes (rather than multi-byte numbers) and doesn't need a BOM or notion of endianness.

Upvotes: 3

Paul Johnson
Paul Johnson

Reputation: 17786

Memory is organized as a sequence of bytes on both endians, so where data is merely a sequence of bytes, as for instance ASCII or UTF8, there is no problem. The problems start when groups of 2 or 4 or 8 bytes are interpreted as numbers, at which point you need to define whether the first or last byte in a group is the least significant. That is the difference between big and little endian.

If you represent text in UTF-16, where text is coded as a sequence of 2-byte values, then endianness matters, and in fact part of UTF-16 is a zero width non breaking space inserted at the start to indicate the byte order.

Upvotes: 0

Related Questions