ibrajim19
ibrajim19

Reputation: 17

Distribution of teams in openMp

I'm trying to parallelize this code with openMP(4.5) "omp teams" and GPU(Nvidia GTX 1050) and CUDA(9.0), I prove the sequential and the parallel implementation with(CPU i5-7400) and both works fine, but the GPU version doesn't work, the code it's a matrix multiplication C = A * B
I think the #pragma omp target teams distribute parallel for is not correctly distributing but I have no idea where the mistake could be.

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <omp.h>

int main(int argc, char **argv)
{
   int n;//Given by the user
   int i, j, k;
    double start, end;

   for (i=0; i<n; i++) 
      for (j=0; j<n; j++) {
         a[i][j] = ((double)rand())/((double)RAND_MAX);
         b[i][j] = ((double)rand())/((double)RAND_MAX);
         c[i][j] = 0.0;
      }


   start = omp_get_wtime();
   #pragma omp target data map(to: a[0:i][0:k], b[0:k][0:j]) map(tofrom:c[0:i][0:j]) 
   #pragma omp target teams distribute parallel for
   for (i=0; i<n; i++){ 

      for (k=0; k<n; k++) { 

         for (j=0; j<n; j++) {
            c[i][j] += a[i][k]*b[k][j];
         }
     }
  }
    #pragma omp barrier
    end = omp_get_wtime();
    printf("the total time is %5.9f\n", end - start);


   //check a random element if d - c[i][j]) equals 0 the implementation is correct
   i = rand()%n;
   j = rand()%n;
   double d = 0.0;
   for (k=0; k<n; k++)
      d += a[i][k]*b[k][j];


   printf("Check on a random element: %18.9lE\n", fabs(d-c[i][j]));

   return 0;

}

Upvotes: 0

Views: 850

Answers (1)

user1584773
user1584773

Reputation: 709

Most likely the solution to your problem is to declare k and j as private variables.

A few other comments:

  • You may want to change the loop order inverting the loop on k and j, this way you will be able to collapse the two outer loops
  • There is no need to have an explicit barrier in your code, there is already an implicit one at the end of the omp target region
  • I would suggest you compute the total residual to check that your implementation is correct

#pragma omp target data map(to: a[0:n][0:n], b[0:n][0:n]) map(tofrom:c[0:n][0:n]) 
#pragma omp target teams distribute parallel for private(i,j,k) collapse(2)
for (i=0; i<n; i++)
  for (j=0; j<n; j++)
    for (k=0; k<n; k++)
        c[i][j] += a[i][k]*b[k][j];

Upvotes: 0

Related Questions