Reputation: 2886
I know that network byte order is big endian regardless of host endianness.
My question is what happens when a binary file is moved from a BE to LE platform. From this post I can see the data byte order on disk is the same as memory byte order of the platform: https://stackoverflow.com/a/5751824
So assume I have a small binary file on a LE machine with this content in LE order 2 5
:
(Note I have turned this into an actual binary file with xxd
)
00000010 00000101
wget
to download this file on a BE machine.00000010 00000101
.Didn't the file had to be reversed to 00000101 00000010
before transmission? If so
would't the BE machine store it as is in network order after it has received it? How is the file content not reversed after download?
Upvotes: 0
Views: 621
Reputation: 26656
My question is what happens when a binary file is moved from a BE to LE platform.
The file bytes should be copied without being swapped at all — the bytes should appear in the same order on all machines, as elaborated byte by byte.
For simple text it doesn't matter, since sequences of individual bytes are not endian in nature.
For most anything else, there will be a defined file format, which specifies the endianness of numeric fields larger than 8 bits. This is what you have already observed about TCP/IP, which defines big endian for the headers.
JPG, PNG, others either avoid multi-byte numerics or define how the bytes in the file are interpreted when multi-byte numeric values are employed.
Certain data formats will use a Byte Order Mark BOM, which is part of a flexible format that allows the writer to choose endianness (so can choose the one that is natural for the writing system, if desired), and, this allows reader to determine the endianness of the file.
For multi-byte text, Unicode uses some of the above features, but the more modern encoding, UTF-8, is supposed to be interpreted as "simple" sequence of bytes (rather than multi-byte numbers) and doesn't need a BOM or notion of endianness.
Upvotes: 3
Reputation: 17786
Memory is organized as a sequence of bytes on both endians, so where data is merely a sequence of bytes, as for instance ASCII or UTF8, there is no problem. The problems start when groups of 2 or 4 or 8 bytes are interpreted as numbers, at which point you need to define whether the first or last byte in a group is the least significant. That is the difference between big and little endian.
If you represent text in UTF-16, where text is coded as a sequence of 2-byte values, then endianness matters, and in fact part of UTF-16 is a zero width non breaking space inserted at the start to indicate the byte order.
Upvotes: 0