Reputation: 20414

How do I read a little-endian 64-bit value from a byte buffer?

In a C application (not C++) I have a byte array of data received over the network. The array is 9 bytes long. The bytes 1 to 8 (zero-based) represent a 64-bit integer value as little endian. My CPU also uses little endian.

How can I convert these bytes from the array to an integer number?

I've tried this:

uint8_t rx_buffer[2000];
//recvfrom(sock, rx_buffer, sizeof(rx_buffer) - 1, ...)
int64_t sender_time_us = *(rx_buffer + 1);

But it gives me values like 89, 219, 234 or 27. The sender sees the values as 1647719702937548, 1647719733002117 or 1647719743790424. (These examples don't match, they're just random samples.)

Upvotes: 1

Answers (4)

dumbass

Reputation: 27235

The portable way to read a little-endian 64-bit value is very straightforward:

inline static uint64_t load_u64le(const void *p) {
    const unsigned char *q = p;
    uint64_t result = 0;
    result |= q[7]; result <<= 8;
    result |= q[6]; result <<= 8;
    result |= q[5]; result <<= 8;
    result |= q[4]; result <<= 8;
    result |= q[3]; result <<= 8;
    result |= q[2]; result <<= 8;
    result |= q[1]; result <<= 8;
    result |= q[0];
    return result;
}

inline static int64_t load_i64le(const void *p) {
    return (int64_t)load_u64le(p);
}

Simply invoke this helper function as read_i64le(rx_buffer + 1). Modern compilers are able to optimize this to a single instruction on architectures where that is possible.

To read a 64-bit value where you specifically know the endianness agrees with the native ABI, you can use this:

inline static uint64_t load_u64(const void *p) {
    uint64_t result;
    memcpy(&result, p, sizeof(result));
    return result;
}

which has even better chances of being optimized into a simple load, assuming only that the compiler optimizes a short memcpy into an inline memory load.

For best results then, you can use:

inline static uint64_t load_u64le(const void *p) {
    uint64_t result = 0;
#if defined(__BYTE_ORDER__) && __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
    memcpy(&result, p, sizeof(result));
#else
    const unsigned char *q = p;
    result |= q[7]; result <<= 8;
    result |= q[6]; result <<= 8;
    result |= q[5]; result <<= 8;
    result |= q[4]; result <<= 8;
    result |= q[3]; result <<= 8;
    result |= q[2]; result <<= 8;
    result |= q[1]; result <<= 8;
    result |= q[0];
#endif
    return result;
}

Now, why you shouldn’t cast an offset pointer like the other answers suggest: first of all, because dereferencing a misaligned pointer is UB. Not every architecture supports reading words wider than 8 bits from arbitrary addresses, and even on those architectures that do support them, the compiler may still make the assumption that all dereferenced addresses are properly aligned when generating code, especially under optimizations. If you ever run your code with UBSan, it will also complain.

The second reason is strict aliasing. The C language stipulates that all memory must be accessed either via a pointer to a character type (char, signed char or unsigned char) or a pointer to the type of which an object is stored in that memory; this ensures that pointers to different types can be assumed not to alias (point to the same memory). In practice, uint8_t is usually an alias of unsigned char, which is a character type, exceptionally allowed to alias any type; this makes the strict aliasing concern mostly theoretical, so far. Nevertheless, there is no reason to take that risk either, when avoiding it is so easy and cheap.

Upvotes: 1

ikegami

Reputation: 386676

Unsafe solution:

int64_t sender_time_us = *(int64_t*)(rx_buffer + 1);

This is potentially an alignment violation, and it's a strict aliasing rule violation. It's undefined behaviour. On some machines, this can kill your program with a bus error.

Safe solution:

int64_t sender_time_us;
memcpy( &sender_time_us, rx_buffer + 1, sizeof( int64_t ) );

@Nate Eldredge points out that while this solution may look inefficient, a decent compiler should optimize this into something efficient. The net effect will be (a) to force the compiler to properly handle the unaligned access, if the target needs any special handling, (b) to make the compiler properly understand the aliasing and prevent any optimizations that would break it. For a target that is able to handle unaligned accesses normally, the generated code may not change at all.

Upvotes: 4

SGeorgiades

Reputation: 1821

Tou need to cast your pointer, like so:

int64_t sender_time_us = *(int64_t*)(rx_buffer + 1);

As it is, you're only getting one byte of data.

Upvotes: -1

Jim Rhodes

Reputation: 5095

Your code is only getting a single uint8_t. You would need to cast to int64_t first. Something like this:

int64_t* pBuffer = (int64_t*)(rx_buffer + 1);
int64_t sender_time_us = *pBuffer;

But you should be aware that some CPU's may not like to access 64-bit values that are not aligned. It may also be OK to this this if you know the endianess but it would actually be better to handle it in a more portable way.

Upvotes: 1

How do I read a little-endian 64-bit value from a byte buffer?

Answers (4)

Related Questions