HEKTO
HEKTO

Reputation: 4191

TBB parallel_reduce - need better understanding of example from the "Developer Reference"

I'm looking at the example on the page, which shows how to call the parallel_reduce in imperative form. Copy of this example is below:

struct Sum {
  float value;
  Sum() : value(0) {}
  Sum( Sum& s, split ) {value = 0;}
  void operator()( const blocked_range<float*>& r ) {
    float temp = value;
    for( float* a=r.begin(); a!=r.end(); ++a ) {
        temp += *a;
    }
    value = temp;
  }
  void join( Sum& rhs ) {value += rhs.value;}
};

float ParallelSum( float array[], size_t n ) {
  Sum total;
  parallel_reduce( blocked_range<float*>( array, array+n ),
                 total );
  return total.value;
}

My question is - why do we need float temp variable in the operator() body? What might happen, if summation would work directly with the value data member:

for( float* a=r.begin(); a!=r.end(); ++a ) value += *a;

Upvotes: 1

Views: 316

Answers (1)

Anton
Anton

Reputation: 6537

It will work when applied directly because each instance of the class is used by one thread at the same time.

But using this variable directly might prevent compiler from vectorization of the loop. I'd go one step further with this logic by caching the r.end() because if it is not inlined properly, it can break vectorization as well. Though, it is not directly related to TBB itself, just general C++ optimization tricks.

Upvotes: 3

Related Questions