Reputation: 37
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
#define pow(x) ((x) * (x))
#define NUM_THREADS 8
#define wmax 1000
#define Nv 2
#define N 5
int b=0;
float Points[N][Nv]={ {0,1}, {3,4}, {1,2}, {5,1} ,{8,9}};
float length[wmax+1]={0};
float EuclDist(float* Ne, float* Pe) {
int i;
float s = 0;
for (i = 0; i < Nv; i++) {
s += pow(Ne[i] - Pe[i]);
}
return s;
}
void DistanceFinder(float* a[]){
int i;
#pragma omp simd
for (i=1;i<N+1;i++){
length[b] += EuclDist(&a[i],&a[i-1]);
}
//printf(" %f\n", length[b]);
}
void NewRoute(){
//some irrelevant things
DistanceFinder(Points);
}
int main(){
omp_set_num_threads(NUM_THREADS);
do{
b+=1;
NewRoute();
} while (b<wmax);
}
Trying to parallelize this loop and trying different things, tried this one.
Seems to be the fastest, however is it correct to use SIMD like that? Because I'm using a previous iteration (i
and i - 1
). The results I see though are correct weirdly or not.
Upvotes: 1
Views: 122
Reputation: 51443
Seems to be the fastest, however is it correct to use SIMD like that?
First, there is a race condition that needs to be fixed, namely during the updates of the array length[b]
. Moreover, you are accessing memory outside the array a
; (iterating from 1
to N + 1
), and you are passing &a[i]
. You can fix the race condition by using OpenMP reduction
clause:
void DistanceFinder(float* a[]){
int i;
float sum = 0;
float tmp;
#pragma omp simd private(tmp) reduction(+:sum)
for (i=1;i<N;i++){
tmp = EuclDist(a[i], a[i-1]);
sum += tmp;
}
length[b] += sum;
}
Furthermore, you need to provide a version of EuclDist
as follows:
#pragma omp declare simd uniform(Ne, Pe)
float EuclDist(float* Ne, float* Pe) {
int i;
float s = 0;
for (i = 0; i < Nv; i++)
s += pow(Ne[i] - Pe[i]);
return s;
}
Because I'm using a previous iteration (i and i - 1).
In your case, it is okay, since the array a
is just being read.
The results I see though are correct weirdly or not.
Very-likely there was no vectorization taking place. Regardless, it would still be undefined behavior due to the aforementioned race condition.
You can simplify your code so that it increases the likelihood of the vectorization actually happening, for instance:
void DistanceFinder(float* a[]){
int i;
float sum = 0;
float tmp;
#pragma omp simd private(tmp) reduction(+:sum)
for (i=1;i<N;i++){
tmp = pow(a[i][0] - a[i-1][0]) + pow(a[i][1] - a[i-1][1])
sum += tmp;
}
length[b] += sum;
}
A further change that you can do to improve the performance of your code is to allocate the matrix (that is passed as a parameter of the function DistanceFinder
) in a manner that when you iterate over its rows (i.e., a[i]
) you would be iterating over continuous memory address.
For instance, you could pass two arrays a1
and a2
to represent the first and second columns of the matrix a
:
void DistanceFinder(float a1[], float a2[]){
int i;
float sum = 0;
float tmp;
#pragma omp simd private(tmp) reduction(+:sum)
for (i=1;i<N;i++){
tmp = pow(a1[i] - a1[i-1]) + pow(a2[i][1] - a2[i-1][1])
sum += tmp;
}
length[b] += sum;
}
Upvotes: 1