Reputation: 2736
Suppose we are trying to remove the trailing zeroes from some unsigned variable.
uint64_t a = ...
uint64_t last_bit = a & -a; // Two's complement trick: last_bit holds the trailing bit of a
a /= last_bit; // Removing all trailing zeroes from a.
I noticed that it's faster to manually count the bits and shift. (MSVC compiler with optimizations on)
uint64_t a = ...
uint64_t last_bit = a & -a;
size_t last_bit_index = _BitScanForward64( last_bit );
a >>= last_bit_index
Are there any further quick tricks that would make this even faster, assuming that the compiler intrinsic _BitScanForward64
is faster than any of the alternatives?
Upvotes: 0
Views: 801
Reputation: 1
Calculate the trailing zeros
tz = (int)(log2((n & (n - 1)) ^ n)
n = 160 is not power of two
bits n: 01010000
bits n - 1: 10011111
and --------
00010000
bits n: 01010000
xor --------
00010000 get leftmost bit (32)
log(32) = 5
5 trailing zeros
its the same using __builtin_ctz in gcc
but __builtin_ctz is better (cpu proccess)
__builtin_ctz
__builtin_ctzl long
__builtin_clz left zeros
__builtin_clzl left zeros for long
Upvotes: -2
Reputation: 10604
You can use std::countr_zero
(c++20) and rely on the compiler to optimize it.
a >>= std::countr_zero(a);
(bonus: you don't need to specify the width and it works with any unsigned integer type)
Upvotes: 5
Reputation: 13719
On x86, _tzcnt_u64
is a faster alterative of _BitScanForward64
, if it is available (it is available with BMI instruction set).
Also, you can directly use that on the input, you don't need to isolate lowest bit set, as pointed out by @AlanBirtles in a comment.
Other than that, noting can be done for a single variable. For an array of them, there may be a SIMD solution.
Upvotes: 4