Brendan Rius
Brendan Rius

Reputation: 610

Changing endianess, is union more efficient than bitshifts?

I was asked for a challenge to change the endianess of an int. The idea I had was to use bitshifts

int    swap_endianess(int color)
{
    int a;
    int r;
    int g;
    int b;

    a = (color & (255 << 24)) >> 24;
    r = (color & (255 << 16)) >> 16;
    g = (color & (255 << 8)) >> 8;
    b = (color & 255)
    return (b << 24 | g << 16 | r << 8 | a);
}

But someone told me that it was more easy to use a union containing an int and an array of four chars (if an int is stored on 4 chars), fill the int and then reverse the array.

union   u_color
{
  int   color;
  char  c[4];
};

int             swap_endianess(int color)
{
  union u_color ucol;
  char          tmp;

  ucol.color = color;
  tmp = ucol.c[0];
  ucol.c[0] = ucol.c[3];
  ucol.c[3] = tmp;
  tmp = ucol.c[1];
  ucol.c[1] = ucol.c[2];
  ucol.c[2] = tmp;
  return (ucol.color);
}

What is the more efficient way of swapping bytes between those two? Are there more efficient ways of doing this?

EDIT

After having tested on an I7, the union way takes about 24 seconds (measured with time command), while the bitshift way takes about 15 seconds on 2,000,000,000 iterations. The is that if I compile with -O1, both of the methods will take only 1 second, and 0.001 second with -O2 or -O3.

The bitshift methods compile to bswap in ASM with -02 and -03, but not the union way, gcc seems to recognize the naive pattern but not the complicated union way to do it. To conclude, read the bottom line of @user3386109.

Upvotes: 2

Views: 480

Answers (2)

fuz
fuz

Reputation: 93127

You can also use this code which might be slightly more efficient:

#include <stdint.h>

extern uint32_t
change_endianness(uint32_t x)
{
    x = (x & 0x0000FFFFLU) << 16 | (x & 0xFFFF0000LU) >> 16;
    x = (x & 0x00FF00FFLU) <<  8 | (x & 0xFF00FF00LU) >>  8;
    return (x);
}

This is compiled by gcc on amd64 to the following assembly:

change_endianness:
    roll $16, %edi
    movl %edi, %eax
    andl $16711935, %edi
    andl $-16711936, %eax
    salq $8, %rdi
    sarq $8, %rax
    orl  %edi, %eax
    ret

To get an even better result, you might want to employ embedded assembly. The i386 and amd64 architectures provide a bswap instruction to do what you want. As user3386109 explained, compilers might recognize the “naïve” approach and emit bswap instructions, something that doesn't happen with the approach from above. It is however better in case the compiler is not smart enough to detect that it can use bswap.

Upvotes: 2

user3386109
user3386109

Reputation: 34839

Here is the correct code for a byte swap function

uint32_t changeEndianess( uint32_t value )
{
    uint32_t r, g, b, a;

    r = (value >> 24) & 0xff;
    g = (value >> 16) & 0xff;
    b = (value >>  8) & 0xff;
    a =  value        & 0xff;

    return (a << 24) | (b << 16) | (g << 8) | r;
}

Here's a function that tests the byte swap function

void testEndianess( void )
{
    uint32_t value = arc4random();
    uint32_t result = changeEndianess( value );
    printf( "%08x %08x\n", value, result );
}

Using the LLVM compiler with full optimization, the resulting assembly code for the testEndianess function is

0x93d0:  calll  0xc82e                    ; call `arc4random`
0x93d5:  movl   %eax, %ecx                ; copy `value` into register CX
0x93d7:  bswapl %ecx                 ; <--- this is the `changeEndianess` function
0x93d9:  movl   %ecx, 0x8(%esp)           ; put 'result' on the stack
0x93dd:  movl   %eax, 0x4(%esp)           ; put 'value' on the stack
0x93e1:  leal   0x6536(%esi), %eax        ; compute address of the format string
0x93e7:  movl   %eax, (%esp)              ; put the format string on the stack
0x93ea:  calll  0xc864                    ; call 'printf'

In other words, the LLVM compiler recognizes the entire changeEndianess function and implements it as a single bswapl instruction.


Side note for those wondering why the call to arc4random is necessary. Given this code

void testEndianess( void )
{
    uint32_t value = 0x11223344;
    uint32_t result = changeEndianess( value );
    printf( "%08x %08x\n", value, result );
}

the compiler generates this assembly

0x93dc:  leal   0x6524(%eax), %eax        ; compute address of format string 
0x93e2:  movl   %eax, (%esp)              ; put the format string on the stack
0x93e5:  movl   $0x44332211, 0x8(%esp)    ; put 'result' on the stack
0x93ed:  movl   $0x11223344, 0x4(%esp)    ; put 'value' on the stack
0x93f5:  calll  0xc868                    ; call 'printf'

In other words, given a hardcoded value as input, the compiler precomputes the result of the changeEndianess function, and puts that directly into the assembly code, bypassing the function entirely.


The bottom line. Write your code the way it makes sense to write your code, and let the compiler do the optimizing. Compilers these days are amazing. Using tricky optimizations in source code (e.g. unions) may defeat the optimizations built into the compiler, actually resulting in slower code.

Upvotes: 3

Related Questions