Reputation: 610
I was asked for a challenge to change the endianess of an int. The idea I had was to use bitshifts
int swap_endianess(int color)
{
int a;
int r;
int g;
int b;
a = (color & (255 << 24)) >> 24;
r = (color & (255 << 16)) >> 16;
g = (color & (255 << 8)) >> 8;
b = (color & 255)
return (b << 24 | g << 16 | r << 8 | a);
}
But someone told me that it was more easy to use a union containing an int and an array of four chars (if an int is stored on 4 chars), fill the int and then reverse the array.
union u_color
{
int color;
char c[4];
};
int swap_endianess(int color)
{
union u_color ucol;
char tmp;
ucol.color = color;
tmp = ucol.c[0];
ucol.c[0] = ucol.c[3];
ucol.c[3] = tmp;
tmp = ucol.c[1];
ucol.c[1] = ucol.c[2];
ucol.c[2] = tmp;
return (ucol.color);
}
What is the more efficient way of swapping bytes between those two? Are there more efficient ways of doing this?
After having tested on an I7, the union way takes about 24 seconds (measured with time
command), while the bitshift way takes about 15 seconds on 2,000,000,000 iterations.
The is that if I compile with -O1, both of the methods will take only 1 second, and 0.001 second with -O2 or -O3.
The bitshift methods compile to bswap
in ASM with -02 and -03, but not the union way, gcc seems to recognize the naive pattern but not the complicated union way to do it. To conclude, read the bottom line of @user3386109.
Upvotes: 2
Views: 480
Reputation: 93127
You can also use this code which might be slightly more efficient:
#include <stdint.h>
extern uint32_t
change_endianness(uint32_t x)
{
x = (x & 0x0000FFFFLU) << 16 | (x & 0xFFFF0000LU) >> 16;
x = (x & 0x00FF00FFLU) << 8 | (x & 0xFF00FF00LU) >> 8;
return (x);
}
This is compiled by gcc on amd64 to the following assembly:
change_endianness:
roll $16, %edi
movl %edi, %eax
andl $16711935, %edi
andl $-16711936, %eax
salq $8, %rdi
sarq $8, %rax
orl %edi, %eax
ret
To get an even better result, you might want to employ embedded assembly. The i386 and amd64 architectures provide a bswap
instruction to do what you want. As user3386109 explained, compilers might recognize the “naïve” approach and emit bswap
instructions, something that doesn't happen with the approach from above. It is however better in case the compiler is not smart enough to detect that it can use bswap
.
Upvotes: 2
Reputation: 34839
Here is the correct code for a byte swap function
uint32_t changeEndianess( uint32_t value )
{
uint32_t r, g, b, a;
r = (value >> 24) & 0xff;
g = (value >> 16) & 0xff;
b = (value >> 8) & 0xff;
a = value & 0xff;
return (a << 24) | (b << 16) | (g << 8) | r;
}
Here's a function that tests the byte swap function
void testEndianess( void )
{
uint32_t value = arc4random();
uint32_t result = changeEndianess( value );
printf( "%08x %08x\n", value, result );
}
Using the LLVM compiler with full optimization, the resulting assembly code for the testEndianess
function is
0x93d0: calll 0xc82e ; call `arc4random`
0x93d5: movl %eax, %ecx ; copy `value` into register CX
0x93d7: bswapl %ecx ; <--- this is the `changeEndianess` function
0x93d9: movl %ecx, 0x8(%esp) ; put 'result' on the stack
0x93dd: movl %eax, 0x4(%esp) ; put 'value' on the stack
0x93e1: leal 0x6536(%esi), %eax ; compute address of the format string
0x93e7: movl %eax, (%esp) ; put the format string on the stack
0x93ea: calll 0xc864 ; call 'printf'
In other words, the LLVM compiler recognizes the entire changeEndianess
function and implements it as a single bswapl
instruction.
Side note for those wondering why the call to arc4random
is necessary. Given this code
void testEndianess( void )
{
uint32_t value = 0x11223344;
uint32_t result = changeEndianess( value );
printf( "%08x %08x\n", value, result );
}
the compiler generates this assembly
0x93dc: leal 0x6524(%eax), %eax ; compute address of format string
0x93e2: movl %eax, (%esp) ; put the format string on the stack
0x93e5: movl $0x44332211, 0x8(%esp) ; put 'result' on the stack
0x93ed: movl $0x11223344, 0x4(%esp) ; put 'value' on the stack
0x93f5: calll 0xc868 ; call 'printf'
In other words, given a hardcoded value
as input, the compiler precomputes the result
of the changeEndianess
function, and puts that directly into the assembly code, bypassing the function entirely.
The bottom line. Write your code the way it makes sense to write your code, and let the compiler do the optimizing. Compilers these days are amazing. Using tricky optimizations in source code (e.g. unions) may defeat the optimizations built into the compiler, actually resulting in slower code.
Upvotes: 3