Reputation: 3284
I wrote some code and compiled it using gcc with the native architecture option.
Typically I can take this code and run it on an older computer that doesn't have AVX2 (only AVX), and it works fine. It seems however that the compiler is actually emitting AVX2 instructions (finally!), rather than me needing to include SIMD intrinsics myself.
I'd like to modify the program so that both pathways are supported (AVX2 and non-AVX2). In other words I'd like something the following pseudocode.
if (AVX2){
callAVX2Version();
}else if (AVX){
callAVXVersion();
}else{
callSSEVersion();
}
void callAVX2Version(){
#pragma gcc -mavx2
}
void callAVXVersion(){
#pragma gcc -mavx
}
I know how to do the runtime detection part, my question is whether it is possible to do the function specific SIMD selection part.
Upvotes: 4
Views: 2178
Reputation: 940
The gcc target attribute can be used out of hand like so
[[gnu::target("avx")]]
void foo(){}
[[gnu::target("default")]]
void foo(){}
[[gnu::target("arch=sandybridge")]]
void foo(){}
the call then becomes
foo();
This option does away with the need to name a function differently. If you check out godbolt for example you will see that it creates @gnu_indirect_function for you. set it first to a .resolver function. Which reads the __cpu_model to find out what can be used and set the indirect function to that pointer so any subsequent calls will be a simple function indirect. simple aint it. But you might need to remain closer to you original code base therefore there are other ways
If you do need function switching like in your original example. the following can be used. Which uses nicely worded buildtins so its clear that you are switching on architecture
[[gnu::target("avx")]]
int foo_avx(){ return 1;}
[[gnu::target("default")]]
int foo(){return 0;}
[[gnu::target("arch=sandybridge")]]
int foo_sandy(){return 2;}
int main ()
{
if (__builtin_cpu_is("sandybridge"))
return foo_sandy();
else if (__builtin_cpu_supports("avx"))
return foo_avx();
else
return foo();
}
Because of reasons to be more verbose to others or platforms concerns were indirect functions might not be a supported use case. Below is a way that does the same as the first option but all in c++ code. using a static local function pointer. This means you could order the priority for targets to your own liking or on cases were the build in isn't supported. You can supply your own.
auto foo()
{
using T = decltype(foo_default);
static T* pointer = nullptr;
//static int (*pointer)() = nullptr;
if (pointer == nullptr)
{
if (__builtin_cpu_is("sandybridge"))
pointer = &foo_sandy;
else if (__builtin_cpu_supports("avx"))
pointer = &foo_avx;
else
pointer = &foo_default;
}
return pointer();
};
the following templated example on godbolt uses template<class ... Ts>
to deal with overloads of your functions
meaning if you define a family of callXXXVersion(int)
then foo(int) will happily call the overloaded version for you. as long as you defined the entire family.
Upvotes: 7
Reputation: 3284
Here's my solution. I can compile with AVX2 support and still run on my Ivy Bridge processor (AVX only) just fine.
The functions are:
__attribute__((target("arch=haswell")))
void fir_avx2_std(STD_DEF){
STD_FIR;
}
__attribute__((target("arch=sandybridge")))
void fir_avx_std(STD_DEF){
STD_FIR;
}
//Use default - no arch specified
void fir_sse_std(STD_DEF){
STD_FIR;
}
The call is:
if (s.HW_AVX2 && s.OS_AVX){
fir_avx2_std(STD_Call);
}else if(s.HW_AVX && s.OS_AVX){
fir_avx_std(STD_Call);
}else{
fir_sse_std(STD_Call);
}
s
is a structure that is populated based on some code I found online (https://github.com/Mysticial/FeatureDetector)
STD_FIR
is a macro with the actual code, which gets optimized differently for each architecture.
I'm compiling with: -std=c11 -ffast-math -O3
I originally had -march=haswell
as well, but that was causing problems.
Note, I'm not entirely sure if this is the best target breakdowns ...
Also, I tried getting target_clones
to work, but I was getting an error about needing ifunc
(I thought gcc did that for me ...)
Upvotes: 1