Mitch Ostler
Mitch Ostler

Reputation: 73

Is there an architecture-independent method to create a little-endian byte stream from a value in C?

I am trying to transmit values between architectures, by creating a uint8_t[] buffer and then sending that. To ensure they are transmitted correctly, the spec is to convert all values to little-endian as they go into the buffer.

I read this article here which discussed how to convert from one endianness to the other, and here where it discusses how to check the endianness of the system.

I am curious if there is a method to read bytes from a uint64 or other value in little-endian order regardless of whether the system is big or little? (ie through some sequence of bitwise operations)

Or is the only method to first check the endianness of the system, and then if big explicitly convert to little?

Upvotes: 0

Views: 517

Answers (4)

Chris Dodd
Chris Dodd

Reputation: 126175

That's actually quite easy -- you just use shifts to convert between 'native' format (whatever that is) and little-endian

/* put a 32-bit value into a buffer in little-endian order (4 bytes) */
void put32(uint8_t *buf, uint32_t val) {
    buf[0] = val;
    buf[1] = val >> 8;
    buf[2] = val >> 16;
    buf[3] = val >> 24;
}

/* get a 32-bit value from a buffer (little-endian) */
uint32_t get32(uint8_t *buf) {
    return (uint32_t)buf[0] + ((uint32_t)buf[1] << 8) +
           ((uint32_t)buf[2] << 16) + ((uint32_t)buf[3] << 24);
}

If you put a value into a buffer, transmit it as a byte stream to another machine, and then get the value from the received buffer, the two machines will have the same 32 bit value regardless of whether they have the same or different native byte oridering. The casts are needed becuase the default promotions will just convert to int, which might be smaller than a uin32_t, in which case the shifts could be out of range.

Be careful if you buffers are char rather than uint8_t (char might or might not be signed) -- you need to mask in that case:

uint32_t get32(char *buf) {
    return ((uint32_t)buf[0] & 0xff) + (((uint32_t)buf[1] & 0xff) << 8) +
           (((uint32_t)buf[2] & 0xff) << 16) + (((uint32_t)buf[3] & 0xff) << 24);
}

Upvotes: 3

Clifford
Clifford

Reputation: 93456

For two systems that must communicated, you specify an "intercomminication-byte order". Then you have functions that convert between that and the native architecture byte order of each system.

There are three approaches to this problem. In order of efficiency:

  1. Compile time detection of endianess
  2. Run time detection of endianness
  3. Endian agnostic code (corresponding to "sequence of bitwise operations" in your question).

Compile time detection of endianess

On architectures whose byte order is the same as the intercomm byte order, these functions do no transformation, but by using them, the same code becomes portable between systems.

Such functions may already exist on your target platform, for example:

Where they don't exist creating them with cross-platform support is trivial. For example:

uint16_t intercom_to_host_16( uint16_t intercom_word )
{
    #if __BIG_ENDIAN__
        return intercom_word ;
    #else
        return intercom_word >> 8 | intercom_word << 8 ;
    #endif
}

Here I have assumed that the intercom order is big-endian, that makes the function compatible with network byte order per ntohs() et al. The macro __BIG_ENDIAN__ is a predefined macro on most compilers. If not simply define it as a command line macro when compiling e.g. -D __BIG_ENDIAN__.

Run time detection of endianness

It is possible to detect endianness at runtime with minimal overhead:

uint16_t intercom_to_host_16( uint16_t intercom_word )
{
    static const union
    {
        uint16_t word ;
        uint8_t bytes[2] ;
    } test = {.word = 0xff00u } ;

    return test.bytes[0] == 0xffu ? 
                            intercom_word :
                            intercom_word >> 8 | intercom_word << 8 ;
}

Of course you might wrap the test in a function for use in similar functions for other word sizes:

#include <stdbool.h>

bool isBigEndian()
{
        static const union
        {
            uint16_t word ;
            uint8_t bytes[2] ;
        } test = {.word = 0xff00u } ;

        return test.bytes[0] == 0xffu ;
}

Then simply have :

uint16_t intercom_to_host_16( uint16_t intercom_word )
{
    return isBigEndian() ? intercom_word :
                           intercom_word >> 8 | intercom_word << 8 ;
}

Endian agnostic code

It is entirely possible to use endian agnostic code, but in that case all participants in the communication or file processing have the software overhead imposed even if the native byte order is already the same as the intercom byte order.

uint16_t intercom_to_host_16( uint16_t intercom_word )
{
    uint8_t host_word [2] = { intercom_word >> 8, 
                              intercom_word << 8 } ;
    return *(uint16_t*)host_word ;
}

Upvotes: 0

You can always serialize an uint64_t value to array of uint8_t in little endian order as simply

uint64_t source = ...;
uint8_t target[8];

target[0] = source;
target[1] = source >> 8;
target[2] = source >> 16;
target[3] = source >> 24;
target[4] = source >> 32;
target[5] = source >> 40;
target[6] = source >> 48;
target[7] = source >> 56;

or

for (int i = 0; i < sizeof (uint64_t); i++) {
    target[i] = source >> i * 8;
}

and this will work anywhere where uint64_t and uint8_t exists.

Notice that this assumes that the source value is unsigned. Bit-shifting negative signed values will cause all sorts of headaches and you just don't want to do that.


Deserialization is a bit more complex if reading byte at a time in order:

uint8_t source[8] = ...;
uint64_t target = 0;

for (int i = 0; i < sizeof (uint64_t); i ++) {
    target |= (uint64_t)source[i] << i * 8;
}

The cast to (uint64_t) is absolutely necessary, because the operands of << will undergo integer promotions, and uint8_t would always be converted to a signed int - and "funny" things will happen when you shift a set bit into the sign bit of a signed int.


If you write this into a function

#include <inttypes.h>

void serialize(uint64_t source, uint8_t *target) {
    target[0] = source;
    target[1] = source >> 8;
    target[2] = source >> 16;
    target[3] = source >> 24;
    target[4] = source >> 32;
    target[5] = source >> 40;
    target[6] = source >> 48;
    target[7] = source >> 56;
}

and compile for x86-64 using GCC 11 and -O3, the function will be compiled to

serialize:
        movq    %rdi, (%rsi)
        ret

which just moves the 64-bit value of source into target array as is. If you reverse the indices (7 ... 0; big-endian), GCC will be clever enough to recognize that too and will compile it (with -O3) to

serialize:
        bswap   %rdi
        movq    %rdi, (%rsi)
        ret

Upvotes: 2

dbush
dbush

Reputation: 223689

Most standardized network protocols specify numbers in big-endian format. In fact, big-endian is all referred to as network byte order, and there are functions specifically for translating integers of various sizes between host and network byte order.

These function are htons and ntohs for 16 bit values and htonl and ntohl` for 32 bit values. However, there is no equivalent for 64 bit values, and you're using little-endian for the network protocol, so these won't help you.

You can still however translate between the host byte order and the network byte order (little-endian in this case) without knowing the host order. You can do this by bit shifting the relevant values in to or out of the host numbers.

For example, to convert a 32 bit value from host to little endian and back to host:

uint32_t src_value = *some value*;
uint8_t buf[sizeof(uint32_t)];
int i;

for (i=0; i<sizeof(uint32_t); i++) {
    buf[i] = (src_value >> (8 * i)) & 0xff;
}

uint32_t dest_value = 0;

for (i=0; i<sizeof(uint32_t); i++) {
    dest_value |= (uint32_t)buf[i] << (8 * i);
}

Upvotes: 1

Related Questions