Reputation: 13968
I'm trying to make gcc
generate the bzhi
instruction, part of BMI2,
without using intrinsics,
in order to create a portable code.
Given the outcome of bzhi
, I expected that objective to be relatively accessible.
The following SO answer provides a code example, simplified below :
unsigned bzhi32(unsigned value, int nbBits)
{
return value & ((1u << nbBits) - 1);
}
clang
has no problem generating bzhi
instruction with it, while I haven't found any similar outcome for gcc
so far :
https://godbolt.org/g/jYrh8F
I was wondering if this was possible.
This capability was at least requested, but not sure if it was completed.
If it was, maybe there are just some subtle issues in the code snippet, for example with type or properties, which could be fixed to succeed this transformation with gcc
.
edit : added u
for constant, as suggested by @chux. It marginally changes the outcome for gcc
, though it's still a 4-instructions function without bzhi
.
Upvotes: 5
Views: 598
Reputation: 5040
This optimization is not implemented in gcc as of January 2018 (there is a feature request). You can get the instruction by using intrinsics:
#include <x86intrin.h>
unsigned bzhi32(unsigned value, int nbBits) {
return _bzhi_u32(value, nbBits);
}
Upvotes: 2