OpenMP parallel for loop issue with shared pointer to pointers

Question

I am just learning OpenMP and am having a problem parallelizing a for loop that I"m pretty sure meets the criteria for parallelization. Below is a SSCCE that exhibits the issue I'm having in my actual code.

The problem is that when I compile and run this with OpenMP the contents of wr[i][j] and wi[i][j] are always 1.00 and 0.00 respectively. If I comment out the pragma and run the code serially then wr and wi contain the values that I expect. The code still exhibits the problem even if I run it with just 1 thread after being compiled with OpenMP. The pragma has to be commented out to run it serially.

I think my problem is related to the fact that wr and wi are pointers to pointers. If that's the case, how do I work around it? If that's not the problem what is? Any insights are greatly appreciated.

// compilation gcc -fopenmp -o test test.c -lm 
// usage ./test  
#include 
#include 
#include 
#include 

int main (int argc, char *argv[]){
    double **wr, **wi;  
    double *data;
    double wtemp, wpr, wpi, theta;
    int i, j, n, nn, mmax, istep, num_mmax_iterations, T;
    const int isign = -1;

    if(argc < 3){
        printf("ERROR, usage: %s  
", argv[0]);
        exit(1);
    } else {
        n = atoi(argv[1]);
        T = atoi(argv[2]);
    }

    num_mmax_iterations = log2(n);
    wr = malloc(sizeof(double *) *num_mmax_iterations);
    wi = malloc(sizeof(double *) *num_mmax_iterations);

    mmax = 2;
    for(i = 0; i < num_mmax_iterations; i++) {
        wr[i] = malloc(sizeof(double) * (mmax >> 1));
        wi[i] = malloc(sizeof(double) * (mmax >> 1));
        wr[i][0] = 1.0;
        wi[i][0] = 0.0;
        mmax <<= 1;
    }

    mmax = 2;
    #pragma omp parallel for private(i, j, theta, wtemp, wpr, wpi) firstprivate(mmax) shared(num_mmax_iterations, wr, wi) num_threads(T)
    for(i = 0; i < num_mmax_iterations; i++) {
        theta = isign * (6.28318530717959 / mmax);
        wtemp = sin(0.5 * theta);
        wpr = -2.0 * wtemp * wtemp;
        wpi = sin(theta);       

        for(j = 1; j < (mmax >> 1); j++) {
            wr[i][j] = wr[i][j - 1] * wpr - wi[i][j - 1] * wpi + wr[i][j - 1];
            wi[i][j] = wi[i][j - 1] * wpr + wr[i][j - 1] * wpi + wi[i][j - 1];
        }

        mmax <<= 1;
    }

    for(i = 0; i < num_mmax_iterations; i++) {      
        for(j = 0; j < (mmax >> 1); j++) {
            printf("%.2f | %.2f
", wr[i][j], wi[i][j]);
        }
    }

    for(i = 0; i < num_mmax_iterations; i++) {
        free(wr[i]);
        free(wi[i]);
    }
    free(wr);
    free(wi);
    return 0;
}

Edit

Expected output with the pragma removed running ./test 8 1

1.00 | 0.00
0.00 | 0.00
0.00 | 1.00
0.00 | -0.00
1.00 | 0.00
-0.00 | -1.00
0.00 | 1.00
-1.00 | 0.71
1.00 | -0.00
0.71 | -0.71
-0.00 | 0.00
-0.71 | -0.71
0.00 | -1.00
-0.71 | -0.71
-1.00 | 1.00
-0.71 | 0.92
1.00 | 0.00
-0.00 | -1.00
0.00 | 1.00
-1.00 | 0.71
1.00 | -0.00
0.71 | -0.71
-0.00 | 0.00
-0.71 | -0.71
0.00 | -1.00
-0.71 | -0.71
-1.00 | 1.00
-0.71 | 0.92
1.00 | 0.71
0.92 | 0.38
0.71 | -0.00
0.38 | -0.38
1.00 | 0.00
0.71 | -0.71
-0.00 | -1.00
-0.71 | -0.71
0.00 | 1.00
-0.71 | 0.92
-1.00 | 0.71
-0.71 | 0.38
1.00 | -0.00
0.92 | -0.38
0.71 | -0.71
0.38 | -0.92
-0.00 | 0.00
-0.38 | -0.38
-0.71 | -0.71
-0.92 | -0.92
1.00 | 0.00
0.92 | -0.38
0.71 | -0.71
0.38 | -0.92
-0.00 | -1.00
-0.38 | -0.92
-0.71 | -0.71
-0.92 | -0.38
0.00 | 0.00
-0.38 | 0.00
-0.71 | 0.00
-0.92 | 0.00
-1.00 | 0.00
-0.92 | 0.00
-0.71 | 0.00
-0.38 | 0.00

Actual output with the pragma left intact running ./test 8 (any number of threads)

1.00 | 0.00
1.00 | 0.00
1.00 | 0.00

Note: The actual problem sizes are much larger and the expected data usually varies more than it does above.

yhenon · Accepted Answer

When the code is run serially, the value mmax is modified at each iteration of the outer loop. Your code needs to be modified so that each thread calculates the correct value of mmax for itself. Additionally, there outer loop variable i should be shared between threads, not private. Here's a suggestion for your openMP loop which gives me the same results with and without the pragma.

#pragma omp parallel for private( j, theta, wtemp, wpr, wpi,mmax) num_threads(T)
for(i = 0; i < num_mmax_iterations; i++) {
    mmax = pow(2,i+1);
    theta = isign * (6.28318530717959 / mmax);
    wtemp = sin(0.5 * theta);
    wpr = -2.0 * wtemp * wtemp;
    wpi = sin(theta);       

    for(j = 1; j < (mmax >> 1); j++) {
        wr[i][j] = wr[i][j - 1] * wpr - wi[i][j - 1] * wpi + wr[i][j - 1];
        wi[i][j] = wi[i][j - 1] * wpr + wr[i][j - 1] * wpi + wi[i][j - 1];
    }

}
// mmax has to be set at the end since there is no guarantee of which thread will finish last
mmax = pow(2,1+num_mmax_iterations);

OpenMP parallel for loop issue with shared pointer to pointers

Answers (2)

Related Questions