Reputation: 4191
I'm looking at the example on the page, which shows how to call the parallel_reduce
in imperative form. Copy of this example is below:
struct Sum {
float value;
Sum() : value(0) {}
Sum( Sum& s, split ) {value = 0;}
void operator()( const blocked_range<float*>& r ) {
float temp = value;
for( float* a=r.begin(); a!=r.end(); ++a ) {
temp += *a;
}
value = temp;
}
void join( Sum& rhs ) {value += rhs.value;}
};
float ParallelSum( float array[], size_t n ) {
Sum total;
parallel_reduce( blocked_range<float*>( array, array+n ),
total );
return total.value;
}
My question is - why do we need float temp
variable in the operator()
body? What might happen, if summation would work directly with the value
data member:
for( float* a=r.begin(); a!=r.end(); ++a ) value += *a;
Upvotes: 1
Views: 316
Reputation: 6537
It will work when applied directly because each instance of the class is used by one thread at the same time.
But using this variable directly might prevent compiler from vectorization of the loop. I'd go one step further with this logic by caching the r.end() because if it is not inlined properly, it can break vectorization as well. Though, it is not directly related to TBB itself, just general C++ optimization tricks.
Upvotes: 3