Walter
Walter

Reputation: 45444

omp reduction and lambda functions

The following simple code doesn't give the expected result with gcc 4.7.0. Is this correct or a bug?

  unsigned count_err(std::vector<unsigned> const&num, unsigned mask)
  {
    unsigned c=0;
    // enables to reuse the lambda later (not in this simple example)
    auto f = [&] (unsigned i) { if(i&mask) ++c; };
#pragma omp parallel for reduction(+:c)
    for(unsigned i=0; i<num.size(); ++i)
      f(num[i]);
    return c;
  }

this returns zero: the reduction of c from the lambda function is not done. Btw, I expected the result to be that returned by the serial function

  unsigned count_ser(std::vector<unsigned> const&num, unsigned mask)
  {
    unsigned c=0;
    auto f = [&] (unsigned i) { if(i&mask) ++c; };
    std::for_each(num.begin(),num.end(),f);
    return c;
  }

The following implementations give the expected result (in both cases, the code definitions doing the increment of the reduction variable is moved into the parallel region)

  unsigned count_ok1(std::vector<unsigned> const&num, unsigned mask)
  {
    unsigned c=0;
    auto f = [&] (unsigned i) -> bool { return i&mask; };
#pragma omp parallel for reduction(+:c)
    for(unsigned i=0; i<num.size(); ++i)
      if(f(num[i])) ++c;
    return c;
  }

  unsigned count_ok2(std::vector<unsigned> const&num, unsigned mask)
  {
    unsigned c=0;
#pragma omp parallel reduction(+:c)
    {
      auto f = [&] (unsigned i) { if(i&mask) ++c; };
#pragma omp for
      for(unsigned i=0; i<num.size(); ++i)
        f(num[i]);
    }
    return c;
  }

Is the fact that count_err() gives a different result a compiler bug or correct?

Upvotes: 2

Views: 2129

Answers (1)

Alexander Chertov
Alexander Chertov

Reputation: 2108

I think it's not a compiler bug. Here is my explaination. I think in your first example the lambdas were holding a reference to the global c variable. The thread local copies of c were created when we entered the for-loop. So the threads were incrementing the same global variable (without any synchronization). When we exit the loop the thread-local copies of c (all equal to zero, because the lambdas don't know about them) are summed up to give you 0. The count_ok2 version works because lambdas are holding references to the local c copies.

Upvotes: 7

Related Questions