manus7
manus7

Reputation: 27

Why doesn't my OpenMP simd directive have any use?

I have tried these codes to test SIMD directive in OpenMP.

#include <iostream>
#include <sys/time.h>
#include <cmath>
#define N 4096
#define M 1000
using namespace std;

int main()
{
    timeval start,end;
    float a[N],b[N];
    for(int i=0;i<N;i++)
        b[i]=i;
    gettimeofday(&start,NULL);
    for(int j=0;j<M;j++)
    {
    #pragma omp simd 
        for(int i=0;i<N;i++)
            a[i]=pow(b[i],2.1);
    }
    gettimeofday(&end,NULL);
    int time_used=1000000*(end.tv_sec-start.tv_sec)+(end.tv_usec-start.tv_usec);
    cout<<"time_used="<<time_used<<endl;
    return 1;
}

But either I compiled it by

g++ -fopenmp simd.cpp

or

g++ simd.cpp

their reports for "time_used" are almost the same.It looks like the SIMD directive I used doesn't have any use? Thanks!

Additional questions: I replaced

a[i]=pow(b[i],2.1);

by

a[i]=b[i]+2.1;

and when I compile them by

g++ -fopenmp simd.cpp

the output of "time_used" is about 12000. When I compile them by

g++ simd.cpp

the output of "time_used" is about 12000,almost the same as before.

My computer: Haswell i5,8g RAM,ubuntu kylin 16.04,gcc 5.4.0

Upvotes: 0

Views: 1250

Answers (1)

Cody Gray
Cody Gray

Reputation: 244772

The compiler can't auto-vectorize function calls. It can only vectorize specific arithmetic operations that can be done using SIMD instructions.

Therefore, you need a vector math library that implements the pow function using SIMD instructions. Intel provides one. I'm not sure if pow is one of the functions that it offers with vector optimizations, but I imagine it is. You should also beware that Intel's math library may not be optimal on AMD processors.

You claim that you tried changing the pow function call to a simple addition, but didn't see any improvement in the results. I'm not quite sure how that is possible, because if you change the inner loop from:

a[i]=pow(b[i],2.1);

to, say:

a[i] += b[i];

or:

a[i] += (b[i] * 2);

then GCC, with optimizations enabled, notices that you never use the result and elides the entire thing. It was unable to perform this optimization with the pow function call, because it didn't know whether the function had any other side-effects. However, with code that is visible to the optimizer, it can…well, optimize it. In some cases, it might be able to vectorize it. In this case, it was able to remove it entirely.

If you tried code where the optimizer removed this loop entirely, and you still didn't see an improvement on your benchmark scores, then clearly this is not a bottleneck in your code and you needn't worry about trying to vectorize it.

Upvotes: 1

Related Questions