Jimbo
Jimbo

Reputation: 3284

specify simd level of a function that compiler can use

I wrote some code and compiled it using gcc with the native architecture option.

Typically I can take this code and run it on an older computer that doesn't have AVX2 (only AVX), and it works fine. It seems however that the compiler is actually emitting AVX2 instructions (finally!), rather than me needing to include SIMD intrinsics myself.

I'd like to modify the program so that both pathways are supported (AVX2 and non-AVX2). In other words I'd like something the following pseudocode.

if (AVX2){
   callAVX2Version();
}else if (AVX){
   callAVXVersion();
}else{
   callSSEVersion();
}

void callAVX2Version(){
#pragma gcc -mavx2
}

void callAVXVersion(){
#pragma gcc -mavx
}

I know how to do the runtime detection part, my question is whether it is possible to do the function specific SIMD selection part.

Upvotes: 4

Views: 2178

Answers (2)

Mellester
Mellester

Reputation: 940

The simple and clean Option

The gcc target attribute can be used out of hand like so

[[gnu::target("avx")]]
void foo(){}

[[gnu::target("default")]]
void foo(){}

[[gnu::target("arch=sandybridge")]]
void foo(){}

the call then becomes

foo();

This option does away with the need to name a function differently. If you check out godbolt for example you will see that it creates @gnu_indirect_function for you. set it first to a .resolver function. Which reads the __cpu_model to find out what can be used and set the indirect function to that pointer so any subsequent calls will be a simple function indirect. simple aint it. But you might need to remain closer to you original code base therefore there are other ways

function switching

If you do need function switching like in your original example. the following can be used. Which uses nicely worded buildtins so its clear that you are switching on architecture

[[gnu::target("avx")]]
int foo_avx(){ return 1;}

[[gnu::target("default")]]
int foo(){return 0;}

[[gnu::target("arch=sandybridge")]]
int foo_sandy(){return 2;}

int main ()
{
    if (__builtin_cpu_is("sandybridge"))
        return foo_sandy();
    else if (__builtin_cpu_supports("avx"))
        return  foo_avx();
    else
        return foo();
}

Define your own indirect function

Because of reasons to be more verbose to others or platforms concerns were indirect functions might not be a supported use case. Below is a way that does the same as the first option but all in c++ code. using a static local function pointer. This means you could order the priority for targets to your own liking or on cases were the build in isn't supported. You can supply your own.

auto foo()
{
    using T = decltype(foo_default);
    static T* pointer = nullptr;
    //static int (*pointer)() = nullptr; 
    if (pointer == nullptr)
    {
    if (__builtin_cpu_is("sandybridge"))
        pointer = &foo_sandy;
    else if (__builtin_cpu_supports("avx"))
        pointer = &foo_avx;
    else
        pointer = &foo_default;        
    }
    return pointer();
};

As a bonus note

the following templated example on godbolt uses template<class ... Ts> to deal with overloads of your functions meaning if you define a family of callXXXVersion(int) then foo(int) will happily call the overloaded version for you. as long as you defined the entire family.

Upvotes: 7

Jimbo
Jimbo

Reputation: 3284

Here's my solution. I can compile with AVX2 support and still run on my Ivy Bridge processor (AVX only) just fine.

The functions are:

__attribute__((target("arch=haswell")))
void fir_avx2_std(STD_DEF){
    STD_FIR;    
}

__attribute__((target("arch=sandybridge")))
void fir_avx_std(STD_DEF){
    STD_FIR;
}

//Use default - no arch specified
void fir_sse_std(STD_DEF){
    STD_FIR;    
}

The call is:

if (s.HW_AVX2 && s.OS_AVX){
    fir_avx2_std(STD_Call);
}else if(s.HW_AVX && s.OS_AVX){
    fir_avx_std(STD_Call);
}else{
    fir_sse_std(STD_Call);
}   

s is a structure that is populated based on some code I found online (https://github.com/Mysticial/FeatureDetector)

STD_FIR is a macro with the actual code, which gets optimized differently for each architecture.

I'm compiling with: -std=c11 -ffast-math -O3

I originally had -march=haswell as well, but that was causing problems.

Note, I'm not entirely sure if this is the best target breakdowns ... Also, I tried getting target_clones to work, but I was getting an error about needing ifunc (I thought gcc did that for me ...)

Upvotes: 1

Related Questions