Parallel cumulative (prefix) sums in OpenMP: communicating values between threads

Question

Assume I have a function f(i) which depends on an index i (among other values which cannot be precomputed). I want to fill an array a so that a[n] = sum(f(i)) from i=0 to n-1.

Edit: After a comment by Hristo Iliev I realized what I am doing is a cumulative/prefix sum.

This can be written in code as

float sum = 0;
for(int i=0; i



Now I want to use OpenMP to do this in parallel.  One way I could do this with OpenMP is to write out the values for f(i) in parallel and then take care of the dependency in serial.  If f(i) is a slow function then this could work well since the non-paralleled loop is simple.

#pragma omp parallel for
for(int i=0; i


But it's possible to do this without the non-parallel loop with OpenMP.  The solution, however, that I have come up with is complicated and perhaps hackish.  So my question is if there is a simpler less convoluted way to do this with OpenMP?  

The code below basically runs the first code I listed for each thread.  The result is that values of a in a given thread are correct up to a constant.  I save the sum for each thread to an array suma with nthreads+1 elements.  This allows me to communicate between threads and determine the constant offset for each thread.  Then I correct the values of a[i] with the offset.

float *suma;
#pragma omp parallel
{
    const int ithread = omp_get_thread_num();
    const int nthreads = omp_get_num_threads();
    const int start = ithread*N/nthreads;
    const int finish = (ithread+1)*N/nthreads;
    #pragma omp single
    {
        suma = new float[nthreads+1];
        suma[0] = 0;
    }
    float sum = 0;
    for (int i=start; i


A simple test is just to set f(i) = i.  Then the solution is a[i] = i*(i+1)/2 (and at infinity it's -1/12).

Massimiliano · Accepted Answer

You can extend your strategy to an arbitrary number of sub-regions, and reduce them recursively, using tasks:

#include
#include

using namespace std;

const int n          = 10000;
const int baseLength = 100;

int f(int ii) {
  return ii;
}

int recursiveSumBody(int * begin, int * end){

  size_t length  = end - begin;
  size_t mid     = length/2;
  int    sum     = 0;


  if ( length < baseLength ) {
    for(size_t ii = 1; ii < length; ii++ ){
        begin[ii] += begin[ii-1];
    }
  } else {
#pragma omp task shared(sum)
    {
      sum = recursiveSumBody(begin    ,begin+mid);
    }
#pragma omp task
    {
      recursiveSumBody(begin+mid,end      );
    }
#pragma omp taskwait

#pragma omp parallel for
    for(size_t ii = mid; ii < length; ii++) {
      begin[ii] += sum;
    }

  }
  return begin[length-1];
}

void recursiveSum(int * begin, int * end){

#pragma omp single
  {
    recursiveSumBody(begin,end);
  }    
}


int main() {

  vector a(n,0);

#pragma omp parallel
  {
    #pragma omp for
    for(int ii=0; ii < n; ii++) {          
      a[ii] = f(ii);
    }  

    recursiveSum(&a[0],&a[n]);

  }
  cout << n*(n-1)/2 << endl;
  cout << a[n-1] << endl;

  return 0;
}

Parallel cumulative (prefix) sums in OpenMP: communicating values between threads

Answers (2)

Related Questions