Difference between two methods for OpenMP parallel

Question

I just want to evaluate an integration of function by summarization by OpenMP by using an array to hold every values computed in every step > take sum of all values; and take sum without the array.

The code is:

double f(double x)
{
    return sin(x)*sin(x)/(x*x+1);
}

METHOD 1

    long i = 0;
    const long NUM_STEP = 100000;
    double sum[NUM_STEP];
    double from = 0.0, to = 1.0;
    double step = (to - from)/NUM_STEP;
    double result = 0;

    #pragma omp parallel for shared(sum) num_threads(4)
    for(i=0; i


METHOD 2
    long i = 0;
    const long NUM_STEP = 100000;
    double from = 0.0, to = 1.0;
    double step = (to - from)/NUM_STEP;
    double result = 0;

    #pragma omp parallel for shared(result) num_threads(4)
    for(i=0; i

But the results are too different. The METHOD 1 gave an stable value, but the METHOD 2 gave a changable value. Here is an example:
METHOD 1: 0.178446
METHOD 2: 0.158738
The value of METHOD 1 is true (checked by another tool).

dreamcrash · Accepted Answer

TL;DR The first method does not have a race-condition whereas the second method has.

The first method does not have a race-condition whereas the second method has. Namely, in the first method:

#pragma omp parallel for shared(sum) num_threads(4)
for(i=0; i


each thread saves the result of the operation step*f(from+i*step); in a different position of the array sum[i]. And after that the master thread, sequentially reduces the values saved on the array sum, namely:
for(i=0; i

Actually, there is a slight improvement that you can make on this version; instead of allocating the array sum with the same size as the number of NUM_STEP, you can just allocated it with the same size as the number of threads, and each thread would save in a position equals to its ID, namely:
int total_threads = 4;
double sum[total_threads];
#pragma omp parallel num_threads(total_threads)
{
  int thread_id = omp_get_thread_num();
  for(i=0; i

Nonetheless, the best approach will be to actually fix the second method.
On the second method there is a race-condition on the update of the variable result:
#pragma omp parallel for shared(result) num_threads(4)
for(i=0; i

because the result variable is being updated concurrently by multiple threads in a non-thread safety manner.
To solve this race-condition you need to use the reduction clause:
#pragma omp parallel for reduction(+:result) num_threads(4)
for(i=0; i

Difference between two methods for OpenMP parallel

Answers (1)

Related Questions