Reputation: 1
Does any one know why the following script give wrong results while running on more than 1 threads? Is there a limit to the length of vector used in accumulate? It gives wrong answer when length of the vector is > 999.
ksaha@cswarmfe:g++ -fopenmp test2.cc
ksaha@cswarmfe:export OMP_NUM_THREADS=1
ksaha@cswarmfe:./a.out
result 4000
Time 0.000114875
ksaha@cswarmfe:export OMP_NUM_THREADS=2
ksaha@cswarmfe:./a.out
0.000000e+00, 1.998000e+03
3.996000e+03, 1.998000e+03
result 7992
Time 0.000231437
ksaha@cswarmfe:export OMP_NUM_THREADS=4
ksaha@cswarmfe:./a.out
0.000000e+00, 9.980000e+02
1.996000e+03, 9.980000e+02
3.992000e+03, 9.980000e+02
5.988000e+03, 9.980000e+02
result 7984
Time 0.000265011
//============================================================
#include <vector>
#include <iostream>
#include <numeric>
#include <parallel/numeric>
namespace par = std::__parallel;
double myfunction (double x, double y) {
if(y!=2) printf("%e, %e\n",x,y);
return x+2.0*y;
}
int main(int argc, char **argv)
{
double t = omp_get_wtime();
int err = 0;
std::vector<double> vec(1000,2.0);
for (int i=0; i<1000; i++)
if(vec[i]!=2) std::cout << vec[i] << "+++" << std::endl;
double init = 0.0;
// parallel
double result = par::accumulate(vec.begin(),vec.end(),init,myfunction);
std::cout << "result " << result << std::endl;
std::cout << "Time " << omp_get_wtime()-t << std::endl;
return err;
}
Upvotes: 0
Views: 214
Reputation: 9466
Your binary operation (x+2*y) is not associative, so the result depends on the order of operations.
Upvotes: 0
Reputation: 7187
To get consistent results, your myfunction needs to be associative. In serial mode, it is just processing one element at a time, so myfunction is always called with x being the accumulated value, and y being an entry from the array. So the total is 2 times the sum of all of the accumulated values, which is 4000.
But when called in parallel, both x and y may be accumulated values, and if your myfunction is not associative, you'll get a different result, depending on the processing order.
For example, with 4 elements in your vector, the serial version would result in a total of 16, but the parallel version may be processed as follows, giving 24:
0.0 2.0 0.0 2.0
\ / \ /
4.0 2.0 4.0 2.0
\ / \ /
8.0 8.0
\ /
24.0
Upvotes: 3