Angel de Vicente
Angel de Vicente

Reputation: 1928

I cannot get !$acc parallel to work (but acc kernels does)

I've been trying to use OpenACC with a simple code, but I guess I don't fully understand how to write nested OpenACC loops or what private does. The routine that I'm trying to parallelize is:

SUBROUTINE zcs(zc,kmin,kmax,ju2,jl2)                                                                                                                                                            
INTEGER, INTENT(IN) :: kmin,kmax,ju2,jl2                                                                                                                                                      
DOUBLE PRECISION, DIMENSION(-jl2:jl2,-jl2:jl2,-ju2:ju2,-ju2:ju2,kmin:kmax,kmin:kmax,-kmax:kmax) :: zc                                                                                         

INTEGER :: k,kp,k2,km,kp2,q,q2,mu2,ml2,p2,mup2,pp2,mlp2,ps2,pt2                                                                                                                               
DOUBLE PRECISION :: z0,z1,z2,z3,z4,z5,z6,z7                                                                                                                                                   


! Start loop over K, K' and Q                                                                                                                                                                 
!$acc kernels                                                                                                                                                                                 
do k=kmin,kmax                                                                                                                                                                                
   do kp=kmin,kmax                                                                                                                                                                            
      k2=2*k                                                                                                                                                                                  
      km = MIN(k,kp)                                                                                                                                                                          
      kp2=2*kp                                                                                                                                                                                
      z0=3.d0*dble(ju2+1)*dsqrt(dble(k2+1))*dsqrt(dble(kp2+1))                                                                                                                                
      do q=-km,km                                                                                                                                                                             
         q2=2*q                                                                                                                                                                               

         ! Calculate quantity C and its sum over magnetic quantum numbers                                                                                                                     
         do mu2=-ju2,ju2,2                                                                                                                                                                    
            do ml2=-jl2,jl2,2                                                                                                                                                                 
               p2=mu2-ml2                                                                                                                                                                     
               if(abs(p2).gt.2) cycle                                                                                                                                                         
               z1=w3js(ju2,jl2,2,mu2,-ml2,-p2)                                                                                                                                                
               do mup2=-ju2,ju2,2                                                                                                                                                             
                  if(mu2-mup2.ne.q2) cycle                                                                                                                                                    
                  pp2=mup2-ml2                                                                                                                                                                
                  if(abs(pp2).gt.2) cycle                                                                                                                                                     
                  z2=w3js(ju2,jl2,2,mup2,-ml2,-pp2)                                                                                                                                           
                  do mlp2=-jl2,jl2,2                                                                                                                                                          
                     ps2=mu2-mlp2                                                                                                                                                             
                     if(abs(ps2).gt.2) cycle                                                                                                                                                  
                     pt2=mup2-mlp2                                                                                                                                                            
                     if(abs(pt2).gt.2) cycle                                                                                                                                                  
                     z3=w3js(ju2,jl2,2,mu2,-mlp2,-ps2)                                                                                                                                        
                     z4=w3js(ju2,jl2,2,mup2,-mlp2,-pt2)                                                                                                                                       
                     z5=w3js(2,2,k2,-p2,pp2,q2)                                                                                                                                               
                     z6=w3js(2,2,kp2,-ps2,pt2,q2)                                                                                                                                             
                     z7=1.d0                                                                                                                                                                  
                     if(mod(2*ju2-ml2-mlp2,4).ne.0) z7=-1.d0                                                                                                                                  
                     zc(ml2,mlp2,mu2,mup2,k,kp,q)=z0*z1*z2*z3*z4*z5*z6*z7                                                                                                                     
                  enddo                                                                                                                                                                       
               enddo                                                                                                                                                                          
            enddo                                                                                                                                                                             
         enddo                                                                                                                                                                                

      end do                                                                                                                                                                                  
   end do                                                                                                                                                                                     
end do                                                                                                                                                                                        

!$acc end kernels                                                                                                                                                                             
END SUBROUTINE zcs  

As it is, the code behaves fine, and if I compare the zc matrix after calling this routine, both the non-OpenACC and the OpenACC version give identical answer. But if I try to do it with a parallel directive there seems to be a race condition, that I cannot figure out where it is. The relevant changes are just:

!$acc parallel                                                                                                                                                                                
!$acc loop private(k,kp,k2,km,kp2,z0,q,q2)                                                                                                                                                    
do k=kmin,kmax                                                                                                                                                                                
   do kp=kmin,kmax                                                                                                                                                                            
      k2=2*k                                                                                                                                                                                  
      km = MIN(k,kp)                                                                                                                                                                          
      kp2=2*kp                                                                                                                                                                                
      z0=3.d0*dble(ju2+1)*dsqrt(dble(k2+1))*dsqrt(dble(kp2+1))                                                                                                                                
      do q=-km,km                                                                                                                                                                             
         q2=2*q                                                                                                                                                                               

         ! Calculate quantity C and its sum over magnetic quantum numbers                                                                                                                     
         !$acc loop private(mu2,ml2,p2,z1,mup2,pp2,z2,mlp2,ps2,pt2,z3,z4,z5,z6,z7)                                                                                                            
         do mu2=-ju2,ju2,2                                                                                            


 [...]

!$acc end parallel  

As far as I can see I have declared the appropriate variables as private, but I guess I don't fully understand how to nest several loops, and/or what private really does. Any suggestions to help me properly understand what is going on?

Many thanks, AdV

Upvotes: 0

Views: 73

Answers (1)

Mat Colgrove
Mat Colgrove

Reputation: 5646

The core problem here is that you're passing the loop bounds variables "ju2" and "jl2" by reference to the "w3js" routine. This means that the loop trip count could change during the execution of the loop and thus prevents parallelization. You could try making these variables private, but the easiest thing to do is add the "VALUE" attribute on w3js' arguments so they are passed in by value.

Note that it works in the "kernels" case since the compiler is only parallelizing the outer loops. In the "parallel" case, you're try to parallelize these "non-countable" inner loops.

Upvotes: 1

Related Questions