asantiagot
asantiagot

Reputation: 153

Using critical region with OpenMP

I have a code that I want to parallelise using OpenMP. The program demonstrates a function that performs polynomial interpolation. First I thought of parallelising a for cycle inside one of the many functions in the code. The serial version of this for goes like this:

void polint (double xa[], double ya[], int n, double x, double *y, double *dy) {
    int i, m, ns=1;
    double den,dif,dift,ho,hp,w;
    double *c, *d;
    dif = fabs(x-xa[1]); c = vector(1,n);
    d = vector(1,n); for (i=1; i<= n; i++) { 
        dift = fabs (x - xa[i]); 
        if (dift<dif) {
            ns = i;
            dif = dift;
        }
        c[i] = ya[i];
        d[i] = ya[i]; 
    }
    *y = ya[ns--];
    for (m = 1; m < n; m++) { 
        for (i = 1; i<= n-m; i++) {
            ho = xa[i] - x;
            hp = xa[i+m] - x;
            w = c[i+1] - d[i]; den = ho - hp;
            den = w / den; d[i] = hp * den; c[i] = ho * den;
        }
    *y += (*dy=(2*ns < (n-m) ? c[ns+1] : d[ns--])); 
    }
    free_vector (d, 1, n); free_vector (c, 1, n); 
}

And now, using OpenMP, I built this code:

void polint (double xa[], double ya[], int n, double x, double *y, double *dy) {
    int i, m, ns=1;
    double den,dif,dift,ho,hp,w;
    double *c, *d;
    int nth;
    dif = fabs(x-xa[1]); c = vector(1,n);
    d = vector(1,n); for (i=1; i<= n; i++) { 
        dift = fabs (x - xa[i]); 
        if (dift<dif) {
            ns = i;
            dif = dift;
        }
        c[i] = ya[i];
        d[i] = ya[i]; 
    }
    #pragma omp parallel
    {
    *y = ya[ns--];    
        nth = omp_get_num_threads();
        for (m = 1; m < n; m++) {
            #pragma omp for 
            for (i = 1; i<= n-m; i++) {
                #pragma omp critical 
                {
                    ho = xa[i] - x;
                    hp = xa[i+m] - x;
                    w = c[i+1] - d[i]; den = ho - hp;
                    den = w / den; d[i] = hp * den; c[i] = ho * den;
                }
            }
            #pragma omp critical
            *y += (*dy=(2*ns < (n-m) ? c[ns+1] : d[ns--])); 
        }
        free_vector (d, 1, n); free_vector (c, 1, n);
    } 
}

The program compiles correctly, but the problem comes when I run it. I get the following error message:

Error message displayed when running the program

How can I solve this? I first thought that the array was being overstepped, so I added the critical regions so that many processes wouldn't access the array at the same time. However, this didn't solved the problem. Thanks in advance!

Upvotes: 1

Views: 210

Answers (1)

Gilles
Gilles

Reputation: 9489

Aside from the obvious freeing of the c and d pointers which generates the crash you experience, there are plenty of problems in your code. To better point them, I rewrote your sequential function with more sensible variables declarations and a correct indentation. Here is what it gives me:

void polint( double xa[], double ya[], int n, double x, double *y, double *dy ) {
    double *c = vector( 1, n );
    double *d = vector( 1, n );

    double dif = fabs( x - xa[1] );
    int ns = 1;
    for ( int i = 1; i <= n; i++ ) { 
        double dift = fabs( x - xa[i] ); 
        if ( dift < dif ) {
            ns = i;
            dif = dift;
        }
        c[i] = ya[i];
        d[i] = ya[i]; 
    }

    *y = ya[ns--];
    for ( int m = 1; m < n; m++ ) { 
        for ( int i = 1; i <= n - m; i++ ) {
            double ho = xa[i] - x;
            double hp = xa[i + m] - x;
            double den = ( c[i + 1] - d[i] ) / ( ho - hp );
            d[i] = hp * den;
            c[i] = ho * den;
        }
        *dy = 2 * ns < ( n - m ) ? c[ns + 1] : d[ns--];
        *y += *dy; 
    }

    free_vector( d, 1, n );
    free_vector( c, 1, n ); 
}

Now, reading the code becomes simpler (although it remains quite obfuscated) and a few issues become apparent:

  • Do you declare your arrays to be indexed from 1 to n rather than from 0 to n-1? If so, that's unusual (and error prone). If not, you probably have some problems in your loops (the first i loop notably).
  • dy's value is rewrote at every iteration of the m loop. Knowing this is an output parameter, this means that it's final value will only be the one corresponding to the last value of m. It is not an issue per say, but it seems suspicious... Are you sure it is correct?
  • Now, regarding the possible parallelisation, we can see from the definition of den and the subsequent update of c that c[i] depends of c[i+1]. And this, from a parallelisation's perspective, is a big problem. This means that the way the code is written right now, it cannot be parallelised. That doesn't close the door of parallelisation altogether, but that just means there will be more work to do than just putting some OpenMP directives to get it.

Now, aside from these issues, if we take a look at your parallelised version, there are plenty of other problems:

  • Already mentioned, the freeing of vectors inside the parallel region: it should definitely be moved outside.
  • Enormous issue, you do not declare private the data that should be: at least, m, ho, hp, w and den should be declare private. Moreover, I strongly suspect that *y should be declare reduction(+) and *dy should be declare lastprivate (although the exact way of doing that properly would need some finer tuning).
  • To compensate from the lack of private declarations, you enclosed the full body of your parallel loop into a critical region, essentially re-serialising it. Unfortunately, it is both super-infective and wrong, since as we saw, the order of iterations matters, and this doesn't preserve it...

So in conclusion, I strongly encourage you to do a few things:

  1. This is very important, de-obfuscate and clean-up your code: indent correctly, give meaningful names to your variables, keep one statement per line, declare your variables as late in the code as possible and initialise them immediately... And one or two sensible comments here and there could come handy.
  2. Break the i dependency of c. That can easily be done by keeping two copies of it (one corresponding to its value at the previous iteration of the m loop, and one corresponding to its current value).
  3. Be careful of correctly declaring your variables as private etc when adding your OpenMP directives. Good news is that if you did as I advised you and delayed their declaration as much as possible, you will likely have very few of them to look after, since most declarations will be inside the parallel region, making the corresponding variables automatically private as they should be.

With that, you should be able to get a correct and (hopefully) effective parallel version of your code.

Upvotes: 3

Related Questions