arnodu
arnodu

Reputation: 51

OpenACC loop private clause and race condition

I'm trying to use a worker-private array with OpenACC, but i keep getting wrong results. I guess there is some kind race condition issue going on, but I can't find where.

I'm using the PGI compiler (18.10, OpenPower) and compile with :

pgf90 -gopt -O3 -Minfo=all -Mcuda=ptxinfo -acc -ta=tesla:cc35 main.F90

Here is a minimal example of what i'm trying to achieve:

#define lx 7000
#define ly 500

program test
  implicit none
  integer :: tmp(ly,1), a(lx,ly), b(lx,ly)
  integer :: x,y,i

  do x=1,lx
     do y=1,ly
        a(x,y) = x+y
     end do
  end do

  !$acc parallel num_gangs(1)                                                                                                                                                                                                                 
  !$acc loop worker private(tmp)                                                                                                                                                                                                              
  do x=1,lx
     !$acc loop vector                                                                                                                                                                                                                        
     do y=1,ly
        tmp(y,1) = -a(x,y)
     end do
     !$acc loop vector
     do y=1,ly
        b(x,y) = -tmp(y,1)
     end do
  end do
  !$acc end parallel                                                                                                                                                                                                                          

  print *, "check"
  do x=1,lx
     do y=1,ly
        if(b(x,y) /= x+y) print *, x, y, b(x,y), x+y
     end do
  end do
  print*, "end"
end program

What I expected was to get b == a, but it's not the case.

Please note that I defined tmp(ly,1) because i get the expected result when I define tmp(ly) as a 1D array. Even if it works with a 1D array, i'm not sure it fully respects the OpenACC standard.

Am I missing something here?

EDIT: The last loop checks if a==b and prints the values that are wrong. The expected output (that I get with OpenACC disabled) is :

  check
  end

What I get with OpenACC enabled is something like this (changes between runs):

check
            1            1            5            2
            1            2            6            3
            1            3            7            4
[...]
  end

Upvotes: 1

Views: 499

Answers (2)

Eh Tan
Eh Tan

Reputation: 46

These two acc loop

 !$acc loop vector                                                                                                                                                                                                                        
 do y=1,ly
    tmp(y,1) = -a(x,y)
 end do
 !$acc loop vector
 do y=1,ly
    b(x,y) = -tmp(y,1)
 end do

will be executed on gpu at the same time. That is, they are executed in parallel. To ensure tmp is assgined to correct values in the first loop before it is used in the second loop, they have to be on different acc parallel construct.

The correct code will look like:

  do x=1,lx
      !$acc parallel loop                                                                                                                                                                                                                     
      do y=1,ly
          tmp(y,1) = -a(x,y)
      end do
      !$acc parallel loop                                                                                                                                                                                                                     
      do y=1,ly
          b(x,y) = -tmp(y,1)
      end do
  end do

Upvotes: 0

Mat Colgrove
Mat Colgrove

Reputation: 5646

Looks like a compiler issue where "tmp" is being shared by the workers instead of each worker getting a private copy. This in turn causes a race condition in your code.

I've filed a problem report with PGI (TPR#27025) and sent it to our engineers for further investigation.

The work around is to use "gang" instead of "worker" on the outer loop or as you note, make "tmp" as single dimension array.

Update: TPR #27025 was fixed in the PGI 19.7 release.

Upvotes: 1

Related Questions