Reputation: 4810
I've encountered a bug when using Clang[1] with libomp[2] whereby using omp_priv = omp_orig
in the initializer of a custom OpenMP reduction silently gives erroneous output. For example:
/* file.cpp */
#include <iostream>
#include <complex>
#include <vector>
// alias for concision; does not affect bug
typedef std::complex<double> comp;
// define custom OpenMP reduction for 'comp'
#pragma omp declare \
reduction(+ : comp : omp_out += omp_in ) \
initializer( omp_priv = omp_orig )
int main() {
// comp-vector should sum to 1000+1000i
std::vector<comp> vec(1000, comp(1,1));
comp total = 0;
// reduce vec using custom reduction
#pragma omp parallel for reduction(+:total)
for (int i=0; i<1000; i++)
total += vec[i];
// behold; erroneous (!=1000+1000i) total for #threads>1
std::cout << "total = " << total << std::endl;
return 0;
}
compiled with
clang++ -lstdc++ -Xclang -fopenmp -lomp file.cpp
will correctly output total = (1000,1000)
when run serially, but output seemingly arbitrary erroneously values like (3725,3725)
when run in parallel (e.g. via export OMP_NUM_THREADS=4
). The same code runs fine on all other tested compilers (except MSVC where the custom reduction syntax is unrecognised, grr).
I can work around this by explicitly initialising the custom reduction to zero, i.e. setting omp_priv = 0
, so that the reduction reads:
#pragma omp declare \
reduction(+ : comp : omp_out += omp_in ) \
initializer( omp_priv = 0 )
This works in all tested settings. It seems omp_orig
is not zero when OpenMP attempts to initialize a thread-private complex<double>
. Alas, I am a bit afraid of this solution since I am unsure what omp_orig
is, and why it is behaving strangely with clang and libomp.
The OpenMP spec seems a bit terse on the subject:
"The special identifier omp_orig can also appear in the initializer-clause and it will refer to the storage of the original variable to be reduced."
What is the "storage of the original variable"? Certainly it does not seem to be the value of the reduced variable before the parallel region, since the workaround above works fine even when we choose a non-zero starting total, e.g.
qcomp total = 1234
Will this workaround pose any unforeseen issues?
[1] clang v15.0.0 (specifically arm64-apple-darwin23.5.0
)
[2] cannot find the version, but it was installed via brew install libomp
in July 2024
Upvotes: 0
Views: 37