Reputation: 45444
The following simple code doesn't give the expected result with gcc 4.7.0. Is this correct or a bug?
unsigned count_err(std::vector<unsigned> const&num, unsigned mask)
{
unsigned c=0;
// enables to reuse the lambda later (not in this simple example)
auto f = [&] (unsigned i) { if(i&mask) ++c; };
#pragma omp parallel for reduction(+:c)
for(unsigned i=0; i<num.size(); ++i)
f(num[i]);
return c;
}
this returns zero: the reduction of c
from the lambda function is not done. Btw, I expected the result to be that returned by the serial function
unsigned count_ser(std::vector<unsigned> const&num, unsigned mask)
{
unsigned c=0;
auto f = [&] (unsigned i) { if(i&mask) ++c; };
std::for_each(num.begin(),num.end(),f);
return c;
}
The following implementations give the expected result (in both cases, the code definitions doing the increment of the reduction variable is moved into the parallel region)
unsigned count_ok1(std::vector<unsigned> const&num, unsigned mask)
{
unsigned c=0;
auto f = [&] (unsigned i) -> bool { return i&mask; };
#pragma omp parallel for reduction(+:c)
for(unsigned i=0; i<num.size(); ++i)
if(f(num[i])) ++c;
return c;
}
unsigned count_ok2(std::vector<unsigned> const&num, unsigned mask)
{
unsigned c=0;
#pragma omp parallel reduction(+:c)
{
auto f = [&] (unsigned i) { if(i&mask) ++c; };
#pragma omp for
for(unsigned i=0; i<num.size(); ++i)
f(num[i]);
}
return c;
}
Is the fact that count_err()
gives a different result a compiler bug or correct?
Upvotes: 2
Views: 2129
Reputation: 2108
I think it's not a compiler bug. Here is my explaination. I think in your first example the lambdas were holding a reference to the global c
variable. The thread local copies of c
were created when we entered the for-loop. So the threads were incrementing the same global variable (without any synchronization). When we exit the loop the thread-local copies of c
(all equal to zero, because the lambdas don't know about them) are summed up to give you 0. The count_ok2
version works because lambdas are holding references to the local c
copies.
Upvotes: 7