Lance
Lance

Reputation: 39

OpenACC ,no updated values in shared variable

I got one question about the shared variable in loop. I want to get the value of i and j after calculation in loop. Here is the code below:

#include<iostream>
#include<openacc.h>
#define N 5

using namespace std;

int main(){

int a[N];
int i,j=0;
#pragma acc enter data copyin(a[:N])
#pragma acc enter data copyin(i)
#pragma acc enter data copyin(j)
#pragma acc parallel loop present(a[:N],i,j) 
    for(int i=0;i<N;i++){
           #pragma acc loop
        for(int j=0;j<N;j++){
                 a[i]=i+j;         
                        
        }
    
    
    }
//#pragma acc data copyout(i,j)


#pragma acc update self(a[:N],i,j)


cout<<"a[N]"<<a[N-1]<<endl;
cout<<i<<endl;
cout<<j<<endl;

}

After compiling I got the following information:


      Generating present(a[:],j,i)

      Generating NVIDIA GPU code

      15, #pragma acc loop gang /* blockIdx.x */

      17, #pragma acc loop seq

  17, Loop is parallelizable

  30, Generating update self(i,j,a[:])

$ ./testg



a[N]8
0
0

I don't quite understand that I have defined the a[] array, i, j as shared variables, and a [] array has been updated, but there is no change of values in i and j. I remember that#pragma acc enter data [clause] can be used outside of the region.

Thanks in advance.

Upvotes: 0

Views: 131

Answers (1)

Mat Colgrove
Mat Colgrove

Reputation: 5646

but there is no change of values in i and j.

You've asked this same question a few times before, so I'll assume I've not been clear. Let's try again.

Consider your code:

int i=0,j=0;
    for(int i=0;i<N;i++){
        for(int j=0;j<N;j++){
                 a[i]=i+j;  

The i and j declared in the main body are different i and j's from the the ones you declare in the for loops. Although they have the same name, they have different scoping and different storage. So updating the value of "j" in the for loop will not update the "j" from main. Printing "j" from main will be zero since it has not changed. This is not an OpenACC issue and will still occur if compiled without OpenACC.

I thought you understood this since in an update to one of your versions you did not declare i or j in the for loops:

int i=0,j=0;
    for(i=0;i<N;i++){
        for(j=0;j<N;j++){
                 a[i]=i+j; 

The second issue is that loop index variables must be private for the loop to be parallelized. Otherwise there will be a race condition.

Why are you trying to get the last value of i and j? Besides preventing parallelism, it seems unnecessary given they will equal N.

Another thing is that the order in which loop iterations are executed in parallel is non-deterministic. In other words, loop iteration 3 could get executed before loop iteration 1. With this in mind consider your code without trying to get the last value of i and j:

% cat test.cpp
#include<iostream>
#include<openacc.h>
#define N 5

using namespace std;

int main(){

int a[N];
#pragma acc enter data create(a[:N])
#pragma acc parallel loop present(a[:N])
    for(int i=0;i<N;i++){
        #pragma acc loop
        for(int j=0;j<N;j++){
                 a[i]=i+j;
        }
    }
#pragma acc update self(a[:N])
#pragma acc exit data delete(a)
cout<<"a[N]"<<a[N-1]<<endl;

}
% nvc++ test.cpp -acc -Minfo=accel ; a.out
main:
      9, Generating enter data create(a[:])
         Generating present(a[:])
         Generating NVIDIA GPU code
         12, #pragma acc loop gang /* blockIdx.x */
         14, #pragma acc loop seq
     14, Loop is parallelizable
     20, Generating update self(a[:])
         Generating exit data delete(a[:])
a[N]8

This does happen to work, but only because the compiler correctly identifies that the inner "j" loop is not parallelizable and runs it sequentially (as shown in the compiler feedback messages). If we update the code to force parallelization on the inner loop (i.e. add "vector" to the inner "loop" directive), the value of "a[i]" will be which ever thread happens to update it last.

% cat test.cpp
#include<iostream>
#include<openacc.h>
#define N 5

using namespace std;

int main(){

int a[N];
#pragma acc enter data create(a[:N])
#pragma acc parallel loop present(a[:N])
    for(int i=0;i<N;i++){
        #pragma acc loop vector
        for(int j=0;j<N;j++){
                 a[i]=i+j;
        }
    }
#pragma acc update self(a[:N])
#pragma acc exit data delete(a)
cout<<"a[N]"<<a[N-1]<<endl;

}
% nvc++ test.cpp -acc -Minfo=accel ; a.out
main:
      9, Generating enter data create(a[:])
         Generating present(a[:])
         Generating NVIDIA GPU code
         12, #pragma acc loop gang /* blockIdx.x */
         14, #pragma acc loop vector(32) /* threadIdx.x */
             Interchanging generated vector loop outwards
             Interchanging generated strip mine loop outwards
     14, Loop is parallelizable
     20, Generating update self(a[:])
         Generating exit data delete(a[:])
a[N]4

I'm not sure why you have the j loop at all, given it's the same as setting "a[i]=i+N-1", but it's not parallelizable.

Upvotes: 0

Related Questions