Reputation: 2124
How can I count the number of leading zeros in a 128-bit integer (uint128_t
) efficiently?
I know GCC's built-in functions:
__builtin_clz
, __builtin_clzl
, __builtin_clzll
__builtin_ffs
, __builtin_ffsl
, __builtin_ffsll
However, these functions only work with 32- and 64-bit integers.
I also found some SSE instructions:
__lzcnt16
, __lzcnt
, __lzcnt64
As you may guess, these only work with 16-, 32- and 64-bit integers.
Is there any similar, efficient built-in functionality for 128-bit integers?
Upvotes: 5
Views: 6223
Reputation: 3998
Yakk's answer works well for all kinds of targets as long as gcc supports 128 bit integers for the target. However, note that on the x86-64 platform, with an Intel Haswell processor or newer, there is a more efficient solution:
#include <immintrin.h>
#include <stdint.h>
// tested with compiler options: gcc -O3 -Wall -m64 -mlzcnt
inline int lzcnt_u128 (unsigned __int128 u) {
uint64_t hi = u>>64;
uint64_t lo = u;
lo = (hi == 0) ? lo : -1ULL;
return _lzcnt_u64(hi) + _lzcnt_u64(lo);
}
The _lzcnt_u64 intrinsic compiles (gcc 5.4) to the lzcnt instruction, which is well defined for a zero input (it returns 64), in contrary to gcc's __builtin_clzll(). The ternary operator compiles to the cmove instruction.
Upvotes: 5
Reputation: 275760
inline int clz_u128 (uint128_t u) {
uint64_t hi = u>>64;
uint64_t lo = u;
int retval[3]={
__builtin_clzll(hi),
__builtin_clzll(lo)+64,
128
};
int idx = !hi + ((!lo)&(!hi));
return retval[idx];
}
this is a branch free variant. Note that more work is done than in the branchy solution, and in practice the branching will probably be predictable.
It also relies on __builtin_clzll
not crashing when fed 0: the docs say the result is undefined, but is it just unspecified or undefined?
Upvotes: 6
Reputation: 22328
Assuming a 'random' distribution, the first non-zero bit will be in the high 64 bits, with an overwhelming probability, so it makes sense to test that half first.
Have a look at the code generated for:
/* inline */ int clz_u128 (uint128_t u)
{
unsigned long long hi, lo; /* (or uint64_t) */
int b = 128;
if ((hi = u >> 64) != 0) {
b = __builtin_clzll(hi);
}
else if ((lo = u & ~0ULL) != 0) {
b = __builtin_clzll(lo) + 64;
}
return b;
}
I would expect gcc to implement each __builtin_clzll
using the bsrq
instruction - bit scan reverse, i.e., most-significant bit position - in conjunction with an xor
, (msb ^ 63)
, or sub
, (63 - msb)
, to turn it into a leading zero count. gcc might generate lzcnt
instructions with the right -march=
(architecture) options.
Edit: others have pointed out that the 'distribution' is not relevant in this case, since the HI uint64_t needs to be tested regardless.
Upvotes: 4