Reputation: 51
I'm trying to use a worker-private array with OpenACC, but i keep getting wrong results. I guess there is some kind race condition issue going on, but I can't find where.
I'm using the PGI compiler (18.10, OpenPower) and compile with :
pgf90 -gopt -O3 -Minfo=all -Mcuda=ptxinfo -acc -ta=tesla:cc35 main.F90
Here is a minimal example of what i'm trying to achieve:
#define lx 7000
#define ly 500
program test
implicit none
integer :: tmp(ly,1), a(lx,ly), b(lx,ly)
integer :: x,y,i
do x=1,lx
do y=1,ly
a(x,y) = x+y
end do
end do
!$acc parallel num_gangs(1)
!$acc loop worker private(tmp)
do x=1,lx
!$acc loop vector
do y=1,ly
tmp(y,1) = -a(x,y)
end do
!$acc loop vector
do y=1,ly
b(x,y) = -tmp(y,1)
end do
end do
!$acc end parallel
print *, "check"
do x=1,lx
do y=1,ly
if(b(x,y) /= x+y) print *, x, y, b(x,y), x+y
end do
end do
print*, "end"
end program
What I expected was to get b == a, but it's not the case.
Please note that I defined tmp(ly,1)
because i get the expected result when I define tmp(ly)
as a 1D array. Even if it works with a 1D array, i'm not sure it fully respects the OpenACC standard.
Am I missing something here?
EDIT: The last loop checks if a==b and prints the values that are wrong. The expected output (that I get with OpenACC disabled) is :
check
end
What I get with OpenACC enabled is something like this (changes between runs):
check
1 1 5 2
1 2 6 3
1 3 7 4
[...]
end
Upvotes: 1
Views: 499
Reputation: 46
These two acc loop
!$acc loop vector
do y=1,ly
tmp(y,1) = -a(x,y)
end do
!$acc loop vector
do y=1,ly
b(x,y) = -tmp(y,1)
end do
will be executed on gpu at the same time. That is, they are executed in parallel. To ensure tmp
is assgined to correct values in the first loop before it is used in the second loop, they have to be on different acc parallel
construct.
The correct code will look like:
do x=1,lx
!$acc parallel loop
do y=1,ly
tmp(y,1) = -a(x,y)
end do
!$acc parallel loop
do y=1,ly
b(x,y) = -tmp(y,1)
end do
end do
Upvotes: 0
Reputation: 5646
Looks like a compiler issue where "tmp" is being shared by the workers instead of each worker getting a private copy. This in turn causes a race condition in your code.
I've filed a problem report with PGI (TPR#27025) and sent it to our engineers for further investigation.
The work around is to use "gang" instead of "worker" on the outer loop or as you note, make "tmp" as single dimension array.
Update: TPR #27025 was fixed in the PGI 19.7 release.
Upvotes: 1