user11680003
user11680003

Reputation:

no big endian and little endian in string?

We know different byte ordering machines store the object in memory ordered from least significant byte to most, while other machines store them from most to least. e.g. a hexadecimal value of 0x01234567.

enter image description here

so if we write a C program that print each byte from the memory address, big endian and little endian machines produce different result.

But for strings, This same result would be obtained on any system using ASCII as its character code, independent of the byte ordering and word size conventions. As a consequence, text data is more platform-independent than binary data.

So my question is, why we differential big endian and little endian for binary data, we could make it the same as text data which is platform-independent. What's the point to make big endian and little endian machine just in binary data?

Upvotes: 3

Views: 2923

Answers (2)

Dai
Dai

Reputation: 155015

So my question is, why we differential big endian and little endian for binary data, we could make it the same as text data which is platform-independent. What's the point to make big endian and little endian machine just in binary data?

In short: we already do: for example, a file format specification will dictate if a 32-bit integer should be serialized in big-endian or little-endian order. Similarly, network protocols will dictate the byte-order of multi-byte values (which is why htons is a thing).

However if we're only concerned with in-memory representation of binary data (and not serialized binary data) then it makes sense to only store values using the fastest representation - i.e. by using the byte-order natively preferred by the CPU and ISA. For x86 and x64 this is Little-Endian, but for ARM, MIPS, 68k, and so on - the preferred order is Big-endian (Though most non-x86 ISAs now support both big-endian and little-endian modes).

But for strings, This same result would be obtained on any system using ASCII as its character code, independent of the byte ordering and word size conventions. As a consequence, text data is more platform-independent than binary data.

So my question is, why we differential big endian and little endian for binary data, we could make it the same as text data which is platform-independent.

In short:

  • ASCII Strings are not integers.
  • Integers are not ASCII strings.

You're basically asking why we don't represent integer numbers in Base-10 Big-Endian format: we don't because Base-10 is difficult for digital computers to work with (digital computers work in Base-2). The closest thing we have to what you're describing is binary-coded-decimal and the reason computers today don't use this normally is because it's slow and inefficient (as only 4 bits are needed to represent a Base-10 value in Base-2 - you could "pack" two Base-10 values in a single byte but that can be slow because CPUs generally are fastest on word-sized (and at-least byte-sized) values - not nibble-sized (half-byte) sized-values - and actually this still doesn't solve the big-endian vs. little-endian problem (as BCD values could still be represented using either BE or LE order - and even char-based strings could be stored in reverse order without it affecting how they're processed!).

Upvotes: 2

John Bode
John Bode

Reputation: 123458

Array elements are always addressed from low to high, regardless of endianness conventions.

ASCII and UTF-8 strings are arrays of char, which is not a multibyte type and is not affected by endianness conventions.

"Wide" strings, where each character is represented by wchar_t or another multibyte type, will be affected, but only for the individual elements, not the string as a whole.

Upvotes: 6

Related Questions