Reputation: 17
I'm trying to parallelize this code with openMP(4.5) "omp teams" and GPU(Nvidia GTX 1050) and CUDA(9.0), I prove the sequential and the parallel implementation with(CPU i5-7400) and both works fine, but the GPU version doesn't work, the code it's a matrix multiplication C = A * B
I think the #pragma omp target teams distribute parallel for is not correctly distributing
but I have no idea where the mistake could be.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <omp.h>
int main(int argc, char **argv)
{
int n;//Given by the user
int i, j, k;
double start, end;
for (i=0; i<n; i++)
for (j=0; j<n; j++) {
a[i][j] = ((double)rand())/((double)RAND_MAX);
b[i][j] = ((double)rand())/((double)RAND_MAX);
c[i][j] = 0.0;
}
start = omp_get_wtime();
#pragma omp target data map(to: a[0:i][0:k], b[0:k][0:j]) map(tofrom:c[0:i][0:j])
#pragma omp target teams distribute parallel for
for (i=0; i<n; i++){
for (k=0; k<n; k++) {
for (j=0; j<n; j++) {
c[i][j] += a[i][k]*b[k][j];
}
}
}
#pragma omp barrier
end = omp_get_wtime();
printf("the total time is %5.9f\n", end - start);
//check a random element if d - c[i][j]) equals 0 the implementation is correct
i = rand()%n;
j = rand()%n;
double d = 0.0;
for (k=0; k<n; k++)
d += a[i][k]*b[k][j];
printf("Check on a random element: %18.9lE\n", fabs(d-c[i][j]));
return 0;
}
Upvotes: 0
Views: 850
Reputation: 709
Most likely the solution to your problem is to declare k and j as private variables.
A few other comments:
#pragma omp target data map(to: a[0:n][0:n], b[0:n][0:n]) map(tofrom:c[0:n][0:n])
#pragma omp target teams distribute parallel for private(i,j,k) collapse(2)
for (i=0; i<n; i++)
for (j=0; j<n; j++)
for (k=0; k<n; k++)
c[i][j] += a[i][k]*b[k][j];
Upvotes: 0