Compile-time AVX detection when using multi-versioning

Question

I have quite big function compiled for two different architectures:

__attribute__ ((target ("arch=broadwell"))) void doStuff()
{
    doStuffImpl()
}

__attribute__ ((target ("arch=nocona"))) void doStuff()
{
    doStuffImpl();
}

__attribute__((always_inline)) void doStuffImpl()
{
    (...)
}

I know this is old way of doing multi-versioning, but I'm using gcc 4.9.3. Also actually doStuffImpl() is not single function, but bunch of functions with inlining, where doStuff() is last actual function call, but I don't think it changes anything.

Function contains some code that is auto-vectorized by compiler, but also I need to add some hand-crafted intrinsics there. Obviously different in two different flavours. Question is: how can I recognise in compile-time which SIMD extensions are available? I was trying something like:

#ifdef __AVX2__
AVX_intrinsics();
#elif defined __SSE4.2__
SSE_intrinsics();
#endif

But it seems that defines comes from "global" -march flag, not the one from multiversioning override.

Godbolt (intrinsics are garbage, but shows my point)

I could extract this part and do separate multiversioned function, but that would add cost of dispatching and function call. Is there any way to do compile time differentiation of two multiversioning variants of function?

Andrey Semashev · Accepted Answer

As answered in the comments:

I'd recommend moving each of the CPU targets to a separate translation unit, which is compiled with the corresponding compiler flags. The common doStuffImpl function can be implemented in a header, included in each of the TUs. In that header, you can use predefined macros like __AVX__ to test for available ISA extensions. The __attribute__((target)) attributes are no longer needed and can be removed in this case.

Compile-time AVX detection when using multi-versioning

Answers (1)

Related Questions