StephanieLoves
StephanieLoves

Reputation: 37

OpenMP reduction on older version without overloading

#pragma omp parallel for    // I want reduction but overloading doesn't work on the version used
for (int i = 0; i <500; i++)
    #pragma omp critical
    for (j=i; j < 102342; j++)
    {
        Output[j] += staticConstant[i] * data[j-i];
    }
}

Is there a way to get reduction to work here? making local private copies doesn't speed it up.

Upvotes: 1

Views: 96

Answers (2)

Gilles Gouaillardet
Gilles Gouaillardet

Reputation: 8380

in this case, your best bet is to swap the loops

#pragma omp parallel for
for (j=0; j < 102342; j++)
   for (int i = 0; i <= min(j,499); i++)
        Output[j] += staticConstant[i] * data[j-i];

an other (but sub-optimal) option is to use atomics

#pragma omp parallel for
for (int i = 0; i <500; i++)
   for (j=i; j < 102342; j++)
    {
      #pragma omp atomic
      Output[j] += staticConstant[i] * data[j-i];
    }

Upvotes: 1

Gilles
Gilles

Reputation: 9489

Forget about using a reduction in this case, the overhead it would incur (100000+ elements in the reduction variable) compared to the effect of the parallelization would likely kill most of the gain. Just stick to simple parallelization constructs.

Ideally, what you'd want would be to parallelize the j loop as it carries no dependencies between iterations. So you can just do it like this:

#pragma omp parallel
for ( int i = 0; i < 500; i++ ) {
    #pragma omp for
    for ( int j = i; j < 102342; j++ ) {
        Output[j] += staticConstant[i] * data[j-i];
    }
}

For a code as simple as this one, this should be just enough.

Now, you might want go a step further and try to swap the i and j loops to improve the parallelization effectiveness (not sure it'll make much of a difference though). For that, you'll need to remove the dependency to i in the j loop initialization. Here is a way of doing it:

// first let's do the dependent iterations in j
for ( int i = 0; i < 500; i++ ) {
    for ( int j = i; j < 500; j++ ) {
        Output[j] += staticConstant[i] * data[j-i];
    }
}
// then all the other iterations, and swap the i and j loops
// now we can parallelize no problem
#pragma omp parallel for
for ( int j = 500; j < 102342; j++ ) {
    for ( int i = 0; i < 500; i++ ) {
        Output[j] += staticConstant[i] * data[j-i];
    }
}

Upvotes: 1

Related Questions