Anti Earth
Anti Earth

Reputation: 4810

Understanding `omp_orig` in a custom OpenMP reduction

I've encountered a bug when using Clang[1] with libomp[2] whereby using omp_priv = omp_orig in the initializer of a custom OpenMP reduction silently gives erroneous output. For example:

/* file.cpp */

#include <iostream>
#include <complex>
#include <vector>

// alias for concision; does not affect bug
typedef std::complex<double> comp;

// define custom OpenMP reduction for 'comp'
#pragma omp declare \
    reduction(+ : comp : omp_out += omp_in ) \
    initializer( omp_priv = omp_orig )


int main() {

    // comp-vector should sum to 1000+1000i
    std::vector<comp> vec(1000, comp(1,1));
    comp total = 0;

    // reduce vec using custom reduction
    #pragma omp parallel for reduction(+:total)
    for (int i=0; i<1000; i++)
        total += vec[i];

    // behold; erroneous (!=1000+1000i) total for #threads>1
    std::cout << "total = " << total << std::endl;
    return 0;
}

compiled with

clang++ -lstdc++ -Xclang -fopenmp -lomp file.cpp

will correctly output total = (1000,1000) when run serially, but output seemingly arbitrary erroneously values like (3725,3725) when run in parallel (e.g. via export OMP_NUM_THREADS=4). The same code runs fine on all other tested compilers (except MSVC where the custom reduction syntax is unrecognised, grr).

I can work around this by explicitly initialising the custom reduction to zero, i.e. setting omp_priv = 0, so that the reduction reads:

#pragma omp declare \
    reduction(+ : comp : omp_out += omp_in ) \
    initializer( omp_priv = 0 )

This works in all tested settings. It seems omp_orig is not zero when OpenMP attempts to initialize a thread-private complex<double>. Alas, I am a bit afraid of this solution since I am unsure what omp_orig is, and why it is behaving strangely with clang and libomp.

The OpenMP spec seems a bit terse on the subject:

"The special identifier omp_orig can also appear in the initializer-clause and it will refer to the storage of the original variable to be reduced."

What is the "storage of the original variable"? Certainly it does not seem to be the value of the reduced variable before the parallel region, since the workaround above works fine even when we choose a non-zero starting total, e.g.

qcomp total = 1234

Will this workaround pose any unforeseen issues?

[1] clang v15.0.0 (specifically arm64-apple-darwin23.5.0)

[2] cannot find the version, but it was installed via brew install libomp in July 2024

Upvotes: 0

Views: 37

Answers (0)

Related Questions