Abhi
Abhi

Reputation: 31

OpenMP results not as expected

I have class that contains dynamic 3d array. Object of the class calls a function that does some calculation to populate 1d array and in the end populate object's 3d array with the 1d array data.The size of 1d array is same as of 3d array.
I am using Openmp to fasten the calculation. Single thread execution is giving correct result but as soon I go to multiple threads I get weird result.
Sample Code is given below. Kindly, help to resolve the issue.

class A (
     void func(float *buf);
     void populateRes(*t); 
      private:
         float ***res;
      };

 A a[n];
 int nthrd = omp_get_num_threads();
 float *buf;
 while (cnt < nz)
 {
      #pragma omp parallel shared(cnt) private(buf, tid, omp_i)
      {
           if(cnt == 0 )
             buf = new float[x*y*z];

           #pragma omp for
           for(omp_i=0; omp_i<n; omp_i++)
           { 
              a[omp_i].func(buf);
              a[omp_i].populateRes(buf);
            }
       }
       cnt++;
       if(cnt >= nz)
          delete []buf;
   }

Upvotes: 0

Views: 232

Answers (1)

Hristo Iliev
Hristo Iliev

Reputation: 74485

OpenMP does not preserve the values of private variables between different entries into the same parallel region, the same way that values of automatic local variables are not preserved between different invocations of a function (unless they are declared static). In fact, in most OpenMP implementations, parallel regions are separate functions and private variables are automatic.

This makes your code erroneous since buf would only be allocated on the first iteration of the loop, but then in the next iterations your code would operate on a new uninitialised local copy. It might happen (by pure chance) that the content of the stack of a particular thread is not changed and thus buf retains its value. Also deleting buf outside of the parallel region ignores the fact that there were multiple calls to new.

If you'd like to allocate buf only once, you should put the while loop inside the parallel region and not vice versa. This would also improve the performance since the parallel region would be entered only once and there is an overhead associated with each entry.

A a[n];

#pragma omp parallel
{
   float *buf = new float[x*y*z];
   for (int cnt = 0; cnt < nz; cnt++)
   {
      #pragma omp for
      for (int i = 0; i < n; i++)
      {
         a[i].func(buf);
         a[i].populateRes(buf);
      }
   }
   delete [] buf;
}

(I don't see tid being used inside so I've taken the liberty to remove it from the list of private variables)

Changing the nesting of both loops is possible since there is an implicit barrier at the end of the for worksharing constuct. I don't know if there is other code that you've omitted but given the code from the question, the cnt loop could be even nested inside the worksharing construct, i.e.:

#pragma omp parallel
{
   float *buf = new float[x*y*z];
   #pragma omp for
   for (int i = 0; i < n; i++)
   {
      for (int cnt = 0; cnt < nz; cnt++)
      {
         a[i].func(buf);
         a[i].populateRes(buf);
      }
   }
   delete [] buf;
}

Upvotes: 1

Related Questions