Athanasia Pavlidou
Athanasia Pavlidou

Reputation: 199

How to make a thread wait another to finish using OpenMP threads?

I want to make the following loop that fills the A matrix parallel. For every A[i][j] element that is calculated I want the price in A[i-1][j], A[i-1][j -1] and A[i0][j-1] to have been calculated first. So my thread has to wait for the threads in these positions to have calculated their results. I've tried to achieve this like this:

 #pragma omp parallel for num_threads(threadcnt) \
            default(none) shared(A, lenx, leny) private(j)
            for (i=1; i<lenx; i++)
            {   
                for (j=1; j<leny; j++)
                {
                    do
                    {

                    } while (A[i-1][j-1] == -1 || A[i-1][j] == -1 || A[i][j-1] == -1);

                    A[i][j] = max(A[i-1][j-1]+1,A[i-1][j]-1, A[i][j-1] -1);

                }
            }

My A matrix is initialized in -1 so if A[][] equals to -1 the operation in this cell is not completed. It takes more time than the serial program though.. Any idea to avoid the while loop?

Upvotes: 1

Views: 2785

Answers (2)

Alain Merigot
Alain Merigot

Reputation: 11537

Your solution cannot work. As A is initialized to -1, and A[0][j] is never modified, if i==1, it will test A[1-1][j] and will always fail. BTW, if A is initiliazed to -1, how cannot you have anything but -1 with the max?

When you have dependencies problem, there are two solutions.

First one is to sequentialize your code. Openmp has the ordered directive to do that, but the drawback is that you loose parallelism (while still paying thread creation cost). Openmp 4.5 has a way to describe dependencies with the depend and sink/source statements, but I do not know how efficient can the compiler be to deal with that. And my compilers (gcc 7.3 or clang 6.0) do not support this feature.

Second solution is to modify your algorithm to suppress dependencies. Now, you are computing the maximum of all values that are at the left or above a given element. Lets turn it to a simpler problem. Compute the maximum of all values at the left of a given element. We can easily parallelize it by computing on the different rows, as there no interrow dependency.

// compute b[i][j]=max_k<j(a[i][k]
#pragma omp parallel for
  for(int i=0;i<n;i++){
    for(int j=0;j<n;j++){
      // max per row
      if((j!=0)) b[i][j]=max(a[i][j],b[i][j-1]);
      else b[i][j]=a[i][j]; //  left column initialised to value of a
    }
  }

Consider another simple problem, to compute the prefix maximum on the different columns. It is again easy to parallelize, but this time on the inner loop, as there is not inter-column dependency.

// compute c[i][j]=max_i<k(b[k,j])
  for(int i=0;i<n;i++){
#pragma omp parallel for
    for(int j=0;j<n;j++){
      // max per column
      if((i!=0)) c[i][j]=max(b[i][j],c[i-1][j]);
      else c[i][j]=b[i][j]; //  upper row initialised to value of b
    }
  }

Now you just have to chain these computations to get the expected result. Here is the final code (with a unique array used and some cleanup in the code).

#pragma omp parallel for
  for(int i=0;i<n;i++){
    for(int j=1;j<n;j++){
      // max per row
      a[i][j]=max(a[i][j],a[i][j-1]);
    }
  }
  for(int i=1;i<n;i++){
#pragma omp parallel for
    for(int j=0;j<n;j++){
      // max per column
      a[i][j]=max(a[i][j],a[i-1][j]);
    }
  }

Upvotes: 1

Michael Klemm
Michael Klemm

Reputation: 2853

The waiting loop seems sub-optimal. Apart from burning cores that are spin-waiting, you will also need a plethora of well-placed flush directives to make this code work.

One alternative, especially in the context of a more general parallelization scheme would be to use tasks and task dependences to model the dependences between the different array elements:

#pragma omp parallel
#pragma omp single
for (i=1; i<lenx; i++) {
    for (j=1; j<leny; j++) {
#pragma omp task depend(in:A[i-1][j-1],A[i-1][j],A[i][j-1]) depend(out:A[i][j])
        A[i][j] = max(A[i-1][j-1]+1,A[i-1][j]-1, A[i][j-1] -1);
    }
}

You may want to think about block the matrix updates, so that each task receives a block of the matrix instead of a single element, but the general idea will remain the same.

Another useful OpenMP feature could be the ordered construct and it's ability to adhere to exactly this kind of data dependency:

#pragma omp parallel for
for (int i=1; i<lenx; i++) {
    for (int j=1; j<leny; j++) {
#pragma omp ordered depend(source)
#pragma omp ordered depend(sink:i-1,j-1)        
        A[i][j] = max(A[i-1][j-1]+1,A[i-1][j]-1, A[i][j-1] -1);
    }
}

PS: The code above is untested, but it should get the rough idea across.

Upvotes: 2

Related Questions