Willis Hershey
Willis Hershey

Reputation: 1574

Endianness conversion without relying on undefined behavior

I am using C to read a .png image file, and if you're not familiar with the PNG encoding format, useful integer values are encoded in .png files in the form of 4-byte big-endian integers.

My computer is a little-endian machine, so to convert from a big-endian uint32_t that I read from the file with fread() to a little-endian one my computer understands, I've been using this little function I wrote:

#include <stdint.h>

uint32_t convertEndian(uint32_t val){
  union{
    uint32_t value;
    char bytes[sizeof(uint32_t)];
  }in,out;
  in.value=val;
  for(int i=0;i<sizeof(uint32_t);++i)
    out.bytes[i]=in.bytes[sizeof(uint32_t)-1-i];
  return out.value;
}

This works beautifully on my x86_64 UNIX environment, gcc compiles without error or warning even with the -Wall flag, but I feel rather confident that I'm relying on undefined behavior and type-punning that may not work as well on other systems.

Is there a standard function I can call that can reliably convert a big-endian integer to one the native machine understands, or if not, is there an alternative safer way to do this conversion?

Upvotes: 1

Views: 458

Answers (4)

Luis Colorado
Luis Colorado

Reputation: 12708

This code reads a uint32_t from a pointer of uchar_t in big endian storage, independently of the endianness of your architecture. (The code just acts as if it was reading a base 256 number)

uint32_t read_bigend_int(uchar_t *p, int sz)
{
    uint32_t result = 0;
    while(sz--) {
        result <<= 8;   /* multiply by base */
        result |= *p++; /* and add the next digit */
    }
}

if you call, for example:

int main()
{
    /* ... */
    uchar_t buff[1024];
    read(fd, buff, sizeof buff);

    uint32_t value = read_bigend_int(buff + offset, sizeof value);
    /* ... */
}

Upvotes: 0

Willis Hershey
Willis Hershey

Reputation: 1574

This solution doesn't rely on accessing inactive members of a union, but relies instead on unsigned integer bit-shift operations which can portably and safely convert from big-endian to little-endian or vice versa

#include <stdint.h>

uint32_t convertEndian32(uint32_t in){
  return ((in&0xffu)<<24)|((in&0xff00u)<<8)|((in&0xff0000u)>>8)|((in&0xff000000u)>>24);
}

Upvotes: 0

M.M
M.M

Reputation: 141648

IMO it'd be better style to read from bytes into the desired format, rather than apparently memcpy'ing a uint32_t and then internally manipulating the uint32_t. The code might look like:

uint32_t read_be32(uint8_t *src)   // must be unsigned input
{
     return (src[0] * 0x1000000u) + (src[1] * 0x10000u) + (src[2] * 0x100u) + src[3];
}

It's quite easy to get this sort of code wrong, so make sure you get it from high rep SO users 😉. You may often see the alternative suggestion return (src[0] << 24) + (src[1] << 16) + (src[2] << 8) + src[3]; however, that causes undefined behaviour if src[0] >= 128 due to signed integer overflow , due to the unfortunate rule that the integer promotions take uint8_t to signed int. And also causes undefined behaviour on a system with 16-bit int due to large shifts.

Modern compilers should be smart enough to optimize, this, e.g. the assembly produced by clang little-endian is:

read_be32:                              # @read_be32
    mov     eax, dword ptr [rdi]
    bswap   eax
    ret

However I see that gcc 10.1 produces a much more complicated code, this seems to be a surprising missed optimization bug.

Upvotes: 2

chux
chux

Reputation: 154169

I see no real UB in OP's code.

Portability issues: yes.

"type-punning that may not work as well on other systems" is not a problem with OP's C code yet may cause trouble with other languages.


Yet how about a big (PNG) endian to host instead?

Extract the bytes by address (lowest address which has the MSByte to highest address which has the LSByte - "big" endian) and form the result with the shifted bytes.

Something like:

uint32_t Endian_BigToHost32(uint32_t val) {
  union {
    uint32_t u32;
    uint8_t u8[sizeof(uint32_t)]; // uint8_t insures a byte is 8 bits.
  } x = { .u32 = val };
  return 
      ((uint32_t)x.u8[0] << 24) |
      ((uint32_t)x.u8[1] << 16) |
      ((uint32_t)x.u8[2] <<  8) |
                 x.u8[3];
}

Tip: many libraries have a implementation specific function to efficiently to this. Example be32toh.

Upvotes: 3

Related Questions