Konzo
Konzo

Reputation: 37

Is using pragma omp simd like this correct?


#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
#define pow(x) ((x) * (x))
#define NUM_THREADS 8
#define wmax 1000
#define Nv 2
#define N 5

int b=0;

float Points[N][Nv]={ {0,1}, {3,4}, {1,2}, {5,1} ,{8,9}};

float length[wmax+1]={0};

float EuclDist(float* Ne, float* Pe) {
    int i;
    float s = 0;
 
    for (i = 0; i < Nv; i++) {
        s += pow(Ne[i] - Pe[i]); 
    }
    return s; 
}

void DistanceFinder(float* a[]){
    
    int i;
    #pragma omp simd
    for (i=1;i<N+1;i++){  
        length[b] += EuclDist(&a[i],&a[i-1]);
    }
    //printf(" %f\n", length[b]);
}



void NewRoute(){
//some irrelevant things

DistanceFinder(Points);
}

int main(){
    
omp_set_num_threads(NUM_THREADS);


do{
    b+=1;
    NewRoute();
    } while (b<wmax);
}

Trying to parallelize this loop and trying different things, tried this one.

Seems to be the fastest, however is it correct to use SIMD like that? Because I'm using a previous iteration (i and i - 1). The results I see though are correct weirdly or not.

Upvotes: 1

Views: 122

Answers (1)

dreamcrash
dreamcrash

Reputation: 51443

Seems to be the fastest, however is it correct to use SIMD like that?

First, there is a race condition that needs to be fixed, namely during the updates of the array length[b]. Moreover, you are accessing memory outside the array a; (iterating from 1 to N + 1), and you are passing &a[i]. You can fix the race condition by using OpenMP reduction clause:

void DistanceFinder(float* a[]){
    
    int i;
    float sum = 0;
    float tmp;
    #pragma omp simd private(tmp) reduction(+:sum)
    for (i=1;i<N;i++){  
        tmp = EuclDist(a[i], a[i-1]);
        sum += tmp;
    }
    length[b] += sum;
}

Furthermore, you need to provide a version of EuclDist as follows:

#pragma omp declare simd uniform(Ne, Pe)
float EuclDist(float* Ne, float* Pe) {
    int i;
    float s = 0;
    for (i = 0; i < Nv; i++)
        s += pow(Ne[i] - Pe[i]); 
    return s; 
}

Because I'm using a previous iteration (i and i - 1).

In your case, it is okay, since the array a is just being read.

The results I see though are correct weirdly or not.

Very-likely there was no vectorization taking place. Regardless, it would still be undefined behavior due to the aforementioned race condition.

You can simplify your code so that it increases the likelihood of the vectorization actually happening, for instance:

void DistanceFinder(float* a[]){
    int i;
    float sum = 0;
    float tmp;
    #pragma omp simd private(tmp) reduction(+:sum)
    for (i=1;i<N;i++){  
        tmp = pow(a[i][0] - a[i-1][0]) + pow(a[i][1] - a[i-1][1]) 
        sum += tmp;
    }
    length[b] += sum;
}

A further change that you can do to improve the performance of your code is to allocate the matrix (that is passed as a parameter of the function DistanceFinder) in a manner that when you iterate over its rows (i.e., a[i]) you would be iterating over continuous memory address.

For instance, you could pass two arrays a1 and a2 to represent the first and second columns of the matrix a:

  void DistanceFinder(float a1[], float a2[]){
        int i;
        float sum = 0;
        float tmp;
        #pragma omp simd private(tmp) reduction(+:sum)
        for (i=1;i<N;i++){  
            tmp = pow(a1[i] - a1[i-1]) + pow(a2[i][1] - a2[i-1][1]) 
            sum += tmp;
        }
        length[b] += sum;
    }

Upvotes: 1

Related Questions