Reputation: 1089
After I had read that the initial value of reduction variable is set according to the operator used for reduction, I decided that instead of remembering these default values it is better to initialize it explicitly. So I modified the code in question by Totonga as follows
const int num_steps = 100000;
double x, sum, dx = 1./num_steps;
#pragma omp parallel private(x) reduction(+:sum)
{
sum = 0.;
#pragma omp for schedule(static)
for (int i=0; i<num_steps; ++i)
{
x = (i+0.5)*dx;
sum += 4./(1.+x*x);
}
}
But it turns out that no matter whether I write sum = 0. or sum = 123.456 the code produces the same result (used gcc-4.5.2 compiler). Can somebody, please, explain me why? (with a reference to openmp standard, if possible) Thanks in advance to everybody.
P.S. since some people object initializing reduction variable, I think it makes sense to expand a question a little. The code below works as expected: I initialize reduction variable and obtain result, which DOES depend on MY initial value
int sum;
#pragma omp parallel reduction(+:sum)
{
sum = 1;
}
printf("Reduction sum = %d\n",sum);
The printed result will be the number of cores, and not 0.
P.P.S I have to update my question again. User Gilles gave an insightful comment: And upon exit of the parallel region, these local values will be reduced using the + operator, and with the initial value of the variable, prior to entering the section.
Well, the following code gives me the result 3.142592653598146
, which is badly calculated pi
instead of expected 103.141592653598146
(the initial code was giving me excellent value of pi=3.141592653598146
)
const int num_steps = 100000;
double x, sum, dx = 1./num_steps;
sum = 100.;
#pragma omp parallel private(x) reduction(+:sum)
{
#pragma omp for schedule(static)
for (int i=0; i<num_steps; ++i)
{
x = (i+0.5)*dx;
sum += 4./(1.+x*x);
}
}
Upvotes: 1
Views: 2502
Reputation: 9489
Why would you want to do that? This is just begging with all your soul for troubles. The reduction
clause and the way the local variables used are initialised are defined for a reason, and the idea is that you don't need to remember these initialisation value just because they are already right.
However, in your code, the behaviour is undefined. Let's see why...
Let's assume your initial code is this:
const int num_steps = 100000;
double x, sum, dx = 1./num_steps;
sum = 0.;
for (int i=0; i<num_steps; ++i) {
x = (i+0.5)*dx;
sum += 4./(1.+x*x);
}
Well, the "normal" way of parallelising it with OpenMP would be:
const int num_steps = 100000;
double x, sum, dx = 1./num_steps;
sum = 0.;
#pragma omp parallel for reduction(+:sum) private(x)
for (int i=0; i<num_steps; ++i) {
x = (i+0.5)*dx;
sum += 4./(1.+x*x);
}
Pretty straightforward, isn't it?
Now, when instead of that, you do:
const int num_steps = 100000;
double x, sum, dx = 1./num_steps;
#pragma omp parallel private(x) reduction(+:sum)
{
sum = 0.;
#pragma omp for schedule(static)
for (int i=0; i<num_steps; ++i)
{
x = (i+0.5)*dx;
sum += 4./(1.+x*x);
}
}
You have a problem... The reason is that upon entry into the parallel
region, sum
hadn't been initialised. So when you declare omp parallel reduction(+:sum)
, you create a per-thread private version of sum
, initialised to the "logical" initial value corresponding to the operator of you reduction
clause, namely 0 here because you asked for a +
reduction. And upon exit of the parallel
region, these local values will be reduced using the +
operator, and with the initial value of the variable, prior to entering the section. See this for reference:
The reduction clause specifies a reduction-identifier and one or more list items. For each list item, a private copy is created in each implicit task or SIMD lane, and is initialized with the initializer value of the reduction-identifier. After the end of the region, the original list item is updated with the values of the private copies using the combiner associated with the reduction-identifier
So in summary, upon exit you have the equivalent of sum += sum_local_0 + sum_local_1 + ... sum_local_nbthreadsMinusOne
Therefore, since in your code, sum
doesn't have any initial value, its value upon exit of the parallel
region isn't defined as well, and can be whatever...
Now let's imagine you did indeed initialise it... Then, if instead of using the right initialiser inside the parallel
region (like your sum=0.;
in the hereinabove code), you used for whatever reason sum=1.;
instead, then the final sum won't be just incremented by 1, but by 1 times the number of threads used inside the parallel
region, since the extra value will be counted as many times as there are of threads.
So in conclusion, just use reduction
clauses and variables the "expected"/"naïve" way, that will spare you and the people coming after for maintaining your code a lot of troubles.
Edit: It looks like my point was not clear enough, so I'll try to explain it better:
this code:
int sum;
#pragma omp parallel reduction(+:sum)
{
sum = 1;
}
printf("Reduction sum = %d\n",sum);
Has an undefined behaviour because it is equivalent to:
int sum, numthreads;
#pragma omp parallel
#pragma omp single
numthreads = omp_get_num_threads();
sum += numthreads; // value of sum is undefined since it never was initialised
printf("Reduction sum = %d\n",sum);
Now, this code is valid:
int sum = 0; //here, sum has been initialised
#pragma omp parallel reduction(+:sum)
{
sum = 1;
}
printf("Reduction sum = %d\n",sum);
To convince yourself, just read the snippet of the standard I gave:
After the end of the region, the original list item is updated with the values of the private copies using the combiner associated with the reduction-identifier
So the reduction uses the combination of the private reduction variables and the original value to perform the final reduction upon exit. So if the original value wasn't set, the final value is undefined as well. And that's not because for some reason your compiler gives you a value that seems right, that the code is right.
Is that clearer now?
Upvotes: 4