Reputation: 108
Are the following functions executed in a single clock cycle?
__builtin_popcount
__builtin_ctz
__builtin_clz
also what is the no of clock cycles for the ll(64 bit) version of the same. are they portable. why or why not?
Upvotes: 2
Views: 5770
Reputation: 6586
Do these functions execute in a single clock-cycle?
Not necessarily. On architectures where they can be implemented with a single instruction, they will typically be the fastest way to compute that function (but still not necessarily a single clock cycle). On architectures where they cannot be implemented as a single instruction, their performance is less certain.
On my processor (a Core 2 Duo), __builtin_ctz
and __builtin_clz
can be implemented with a single instruction (Bit Scan Forward and Bit Scan Reverse). However, __builtin_popcount
cannot be implemented with a single instruction on my processor. For __builtin_popcount
, gcc 4.7.2 calls a library function, while clang 3.1 generates an inline instruction sequence (implementing this bit twiddling hack). Clearly, the performance of those two implementations will not be the same.
Are they portable?
They are not portable across compilers. They originated with GCC (as far as I know), and are also implemented in some other compilers such as Clang.
Compilers that do support these functions may provide them for multiple architectures, but implementation quality (performance) is likely to vary.
__builtin
functions like this are used to access specific machine instructions in a somewhat easier way than using inline assembly. If you need to achieve the highest performance and are willing to sacrifice portability to do so or to provide an alternate implementation for compilers or platforms where these functions are not provided, then it makes sense to use them. If optimal low level performance is your goal you should also check the assembly output of the compiler, to determine whether it really is generating the instruction that you expect it to use.
Upvotes: 9
Reputation: 78923
You can get a first idea of what your compiler does with it by compiling it with -O3 -march=native -S
into assembler code. There you can check if this resolves to just one assembler statement. If so, this is not a guarantee that this is done in one cycle. To know the real cost, you'd have to measure.
Upvotes: 3