Kamal Saha
Kamal Saha

Reputation: 1

STL parallel accumulate function wrong?

Does any one know why the following script give wrong results while running on more than 1 threads? Is there a limit to the length of vector used in accumulate? It gives wrong answer when length of the vector is > 999.

ksaha@cswarmfe:g++ -fopenmp test2.cc 

ksaha@cswarmfe:export OMP_NUM_THREADS=1
ksaha@cswarmfe:./a.out 
result 4000
Time 0.000114875

ksaha@cswarmfe:export OMP_NUM_THREADS=2
ksaha@cswarmfe:./a.out 
0.000000e+00, 1.998000e+03
3.996000e+03, 1.998000e+03
result 7992
Time 0.000231437

ksaha@cswarmfe:export OMP_NUM_THREADS=4
ksaha@cswarmfe:./a.out 
0.000000e+00, 9.980000e+02
1.996000e+03, 9.980000e+02
3.992000e+03, 9.980000e+02
5.988000e+03, 9.980000e+02
result 7984
Time 0.000265011

//============================================================
#include <vector>
#include <iostream>
#include <numeric>
#include <parallel/numeric>

namespace par = std::__parallel;


double myfunction (double x, double y) {
               if(y!=2) printf("%e, %e\n",x,y);
               return x+2.0*y;
               }


int main(int argc, char **argv)
{

  double t = omp_get_wtime();

  int err = 0;
  std::vector<double> vec(1000,2.0);

  for (int i=0; i<1000; i++)
  if(vec[i]!=2) std::cout << vec[i] << "+++" << std::endl;


  double init = 0.0;

  // parallel
  double result = par::accumulate(vec.begin(),vec.end(),init,myfunction);
  std::cout << "result " << result << std::endl;

  std::cout << "Time " << omp_get_wtime()-t << std::endl;

  return err;
}

Upvotes: 0

Views: 214

Answers (2)

Maxim Razin
Maxim Razin

Reputation: 9466

Your binary operation (x+2*y) is not associative, so the result depends on the order of operations.

Upvotes: 0

happydave
happydave

Reputation: 7187

To get consistent results, your myfunction needs to be associative. In serial mode, it is just processing one element at a time, so myfunction is always called with x being the accumulated value, and y being an entry from the array. So the total is 2 times the sum of all of the accumulated values, which is 4000.

But when called in parallel, both x and y may be accumulated values, and if your myfunction is not associative, you'll get a different result, depending on the processing order.

For example, with 4 elements in your vector, the serial version would result in a total of 16, but the parallel version may be processed as follows, giving 24:

0.0  2.0      0.0   2.0
 \   /          \  /
  4.0    2.0    4.0    2.0  
     \  /          \   /
      8.0           8.0
          \       /
            24.0

Upvotes: 3

Related Questions