Reputation: 6875
Recently I had to write a code for critical real time functionality and I used few __builtin_... functions. I understand that such code is not portable because not all the compilers support "__builtin_..." functions or syntax. I was wondering if there is a way to write code in a plain C so that the compiler would be able to recognize it and use some internal "__builtin_..."-like function?
Below is a description of a small experement I did but my question is:
For example reverse bytes in a Dword (so that the first byte become the last one, the last one becomes the first one and so on), the x86_64 architecture has a dedicated assembly instruction for it - bswap
. I tried 4 different options:
#include <stdint.h>
#include <stdlib.h>
typedef union _helper_s
{
uint32_t val;
uint8_t bytes[4];
} helper_u;
uint32_t reverse(uint32_t d)
{
helper_u b;
uint8_t temp;
b.val = d;
temp = b.bytes[0];
b.bytes[0] = b.bytes[3];
b.bytes[3] = temp;
temp = b.bytes[1];
b.bytes[1] = b.bytes[2];
b.bytes[2] = temp;
return b.val;
}
uint32_t reverse1(uint32_t d)
{
helper_u b;
uint8_t temp;
b.val = d;
for (size_t i = 0; i < sizeof(uint32_t) / 2; i++)
{
temp = b.bytes[i];
b.bytes[i] = b.bytes[sizeof(uint32_t) - i - 1];
b.bytes[sizeof(uint32_t) - i - 1] = temp;
}
return b.val;
}
uint32_t reverse2(uint32_t d)
{
return (d << 24) | (d >> 24 ) | ((d & 0xFF00) << 8) | ((d & 0xFF0000) >> 8);
}
uint32_t reverse3(uint32_t d)
{
return __builtin_bswap32(d);
}
All the options provide the same functionality. I compiled it with different compilers and different optimization levels, the results were not so good:
GCC - did great! For both -O3
and -Os
optimization levels it gave the same result for all the functions:
reverse:
mov eax, edi
bswap eax
ret
reverse1:
mov eax, edi
bswap eax
ret
reverse2:
mov eax, edi
bswap eax
ret
reverse3:
mov eax, edi
bswap eax
ret
Clang a little disappointed me. With the -O3
it gave the same result as GCC however with the -Os
it totally lost the path in reverse1
. It didn't recognize the pattern and produced way less optimal binary:
reverse1: # @reverse1
lea rax, [rsp - 8]
mov dword ptr [rax], edi
mov ecx, 3
.LBB1_1: # =>This Inner Loop Header: Depth=1
mov sil, byte ptr [rax]
mov dl, byte ptr [rsp + rcx - 8]
mov byte ptr [rax], dl
mov byte ptr [rsp + rcx - 8], sil
dec rcx
inc rax
cmp rcx, 1
jne .LBB1_1
mov eax, dword ptr [rsp - 8]
ret
Actually the difference between reverse
and reverse1
is that reverse
is the "loop unrolled" version of reverse1
, so I assume that with -Os
the compiler didn't even try to unroll or try to anticipate the purpose of the for
loop.
With the ICC, the things went even worse because it was unable to recognize the pattern in reverse
and reverse1
functions both with the -O3
and the -Os
optimization levels.
P.S.
I often hear people say that the code has to be written so that even junior programmer would easily be able to understand it and the modern compilers are "smart" enough to take care of the optimizations. Now I have an evidence that it is not true (or at least not always true).
Upvotes: 1
Views: 141
Reputation: 70412
The technique used for reverse2
is fairly idiomatic (here, for example), and your own testing showed that it is properly optimized on all the systems you tested on. To make the implementation easier to understand, you can introduce more whitespace, and follow a more regular pattern.
uint32_t reverse2(uint32_t d)
{
return ((d & 0x000000FFU) << 24) |
((d & 0x0000FF00U) << 8) |
((d & 0x00FF0000U) >> 8) |
((d & 0xFF000000U) >> 24) ;
}
To your specific points:
Are there any tips, best known methods, guidelines to write a portable C code so that the compiler would be able to detect (let's put aside the compiler bugs) the pattern and use the maximum ability of the target CPU architecture.
The key take away should be to try to write idiomatic code. Judging code to be understandable is somewhat subjective. What may seem clear to me can appear incomprehensible to someone else (and vice versa). However, there are common idioms in C programming that should be followed whenever it is appropriate to do so.
Unfortunately, I do not have at the top of my head a handy list of idioms. But, I can say I largely learned C from reading The C Programming Language (by K & R, of course). And I was an avid reader of C Programming FAQs (by Steve Summit).
However, a very good resource for C idioms can be found by reading and comprehending open source C projects, and of course the source code base of the company you work at. Following the latter has the added benefit that any code you add that follows existing conventions will naturally increase the chances of it being understood by someone else in the company.
I often hear people say that the code has to be written so that even junior programmer would easily be able to understand it and the modern compilers are "smart" enough to take care of the optimizations. Now I have an evidence that it is not true (or at least not always true).
Compilers are just programs, so they cannot read your mind. The compiler will be programmed to look for particular patterns in the AST and apply optimizations to transform the tree into what it considers more optimal. Similarly, the peephole optimizer will look for patterns in the generated machine instructions, and then transform them into fewer equivalent instructions.
But these transformations are only possible if the generated tree or generated instructions follow a recognizable pattern. And these patterns are often determined by analyzing real-world software to see what kind of code gets generated for certain operations. If your code does not result in code that can be recognized by the compiler, you may be partially losing out on the compilers help to optimize.
Thus, another reason to try to write idiomatic C code.
Now, it can be argued that forcing oneself to write idiomatic C is a form of micro-optimization. Should you try to teach the compiler how to optimize the way you write code, or let the compiler teach you how to write code it knows how to optimize? However, the momentum is carried by the existing C programmers that write code idiomatically. New C programmers adopt these idioms for the sake of writing code more easily understood by the people that will be reviewing their code.
Upvotes: 1
Reputation: 1582
As far as I am aware, the proper way to do this is with conditional compilation.
My suggestion is to write plain normal code in standard C as the default, both for maintainability and as a fall-back path that all compilers can handle. Utilize conditional compilation only as necessary to optimize for specific compilers, with a comment explaining the reason for the exception.
Upvotes: 1