KBriggs
KBriggs

Reputation: 1412

Parallelizing a loop using OpenMP when loop iterations are not independent

I have an implementation of a digital bessel filter which is a performance bottleneck for my program. I would like to parallelize the main loop with OpenMP.

The function takes an input array paddedsignal and arrays of filter coefficients dcof and ccof and produces a temporary array temp which is filtered. The main loop of this process looks like this:

    for (i=order; i<end; i++)
    {
        temp[i] = ccof[0]*paddedsignal[i];
        for (p=1; p<=order; p++)
        {
            temp[i] += ccof[p]*paddedsignal[i-p] - dcof[p]*temp[i-p];
        }
    }

In particular, note that the value of temp[i] depends on previous values of temp[i-p]. This confounds a simple #pragma omp for directive, since the value of temp[i] near the boundaries between sections of the array which would be handled by different threads has race condition problems. Basically if thread one takes indices between 0-99 and thread 2 takes 100-199, the value at 100 will be wrong since it will be computed before the value at 99, on which it depends.

Is there a way to salvage this situation? I am despairing as to what I can do to parallelize this since the fact that filtered values depend on neighboring values makes it inherently a serial calculation. Is there any way around this?

Upvotes: 1

Views: 485

Answers (2)

Zulan
Zulan

Reputation: 22660

You basically cannot parallelize the outer loop due to the dependency you correctly identify.

You could parallelize the inner loop with a reduction. But that would only be helpful for large order and even then probably just lead to false sharing because the data is shuffled around. One might concieve a complicated data laout to interleave signal and temp among threads, which would require sort-of a manual work sharing construct. I doubt it would be worth it anyway, since the reduction will add overhead cost.

It is easier to parallelize filters that have a finite impulse response, if that is an alternative for you. You could also segment the whole signal into slightly overlapping segments and compute the filter independently, but there will be some computational error at the borders of those chunks.

Upvotes: 1

QuickSort
QuickSort

Reputation: 664

Maybe you are looking for the #pragma omp for ordered. It executes the code in the same order in which it would be executed sequentially

But keep in mind:

  • can only be used in the dynamic extent of a for directive
  • very "expensive" (it's likely that your speedup will not be as good as you might think)

For more information see: omp ordered

Upvotes: 0

Related Questions