RubenLaguna
RubenLaguna

Reputation: 24806

How to use the __bzhi_uxx intrinsic without letting gcc to insert other BMI2 instructions everywhere?

I would like to use the _bzhi_u32 intrinsic but I don't want to use the -mbmi2 flag since that makes gcc to use other BMI2 instruction (notably SHLX in many << shifts) which will produce SIGILL (Illegal instruction) if the host where the executable runs doesn't not support BMI2.

I only use _bzhi_u32 in one function and I guard it's use by checking at runtime that is supported via _builtin_cpu_is("corei7") defaulting to another implementation if not supported. But I cannot guard the other BMI2 instruction that gcc inserts when -mbmi2 is used.

The problem is that the _bzhi_u32 intrinsic won't be defined in x86intrin.h unless -mbmi2 is specified (with the undesired effect of gcc sprinkling SHLX all over the place).

Upvotes: 4

Views: 878

Answers (3)

RubenLaguna
RubenLaguna

Reputation: 24806

There are two possible alternatives to avoid specifying -mbmi2 globally

  1. If using GCC 4.9 or higher, you can just include x86intrin.h and declare the function use _bzhi_u32 with __attribute__((target ("bmi2"))). That way gcc will generate BMI2 instruction on that function. This doesn't work on 4.8 and lower (_bzhi_u32 is not defined unless __BMI2__ is set and even if it is the linker will complain with undefined reference to '_bzhi_u32').
  2. Put the definition of the function in its own .c file and put #pragma GCC target "bmi2" at the top. This defines __BMI2__ and enables BMI2 instruction generation for this translation unit only.
  3. Put the function in its own file like option 2 and compile with -mbmi2 just that file (which is equivalent to the #pragma GCC target option.
  4. Use inline assembly instead of intrinsics as explained in this other answer.

Options 2 and 3 limits your inline and static options. Option 1 is the way to go if you are using GCC 4.9 or higher.

Upvotes: 3

Jason
Jason

Reputation: 3917

Instead of using the intrinsic, it may be easier to embed the assembler code...

uint32_t val, i;

asm ("bzhi %0,%1,%2" : "=r"(val) : "r"(val), "r"(i) : );

Upvotes: 1

Marc Glisse
Marc Glisse

Reputation: 7925

Quote from gcc 4.9 release notes:

It is now possible to call x86 intrinsics from select functions in a file that are tagged with the corresponding target attribute without having to compile the entire file with the -mxxx option. This improves the usability of x86 intrinsics and is particularly useful when doing Function Multiversioning.

Upvotes: 2

Related Questions