Cluner
Cluner

Reputation: 13

OpenMP: Prefix Sum Algorithm

I'm trying to implement a Prefix Sum Algorithm in C using OpenMP, and I'm stuck.

#include <stdlib.h>
#include <stdio.h>
#include <omp.h>

int main(int argc, char* argv[])
{
    int p = 5;
    int X[5] = { 1, 5, 4, 2, 3 };
    int* Y = (int*)malloc(p * sizeof(int));

    for (int i = 0; i < p; i++)
        printf("%d ", X[i]);
    printf("\n");

    Y[0] = X[0];

    int i;
    #pragma omp parallel for num_threads(4)
    for (i = 1; i < p; i++)
        Y[i] = X[i - 1] + X[i];

    int k = 2;
    while (k < p)
    {
        int i;
        #pragma omp parallel for
        for (i = k; i < p; i++)
            Y[i] = Y[i - k] + Y[i];
        k += k;
    }

    for (int i = 0; i < p; i++)
        printf("%d ", Y[i]);
    printf("\n");

    system("pause");
    return 0;
}

What this code should do?

Input numbers are in X,
output numbers are (prefixes) in Y
and the number count is p.

X = 1, 5, 4, 2, 3

Stage I.

Y[0] = X[0];

Y[0] = 1

Stage II.

int i;
#pragma omp parallel for num_threads(4)
for (i = 1; i < p; i++)
    Y[i] = X[i - 1] + X[i];

Example:

Y[1] = X[0] + X[1] = 6
Y[2] = X[1] + X[2] = 9
Y[2] = X[2] + X[3] = 6
Y[4] = X[3] + X[4] = 5

Stage III. (where I am stuck)

int k = 2;
while (k < p)
{
    int i;
    #pragma omp parallel for
    for (i = k; i < p; i++)
        Y[i] = Y[i - k] + Y[i];
    k += k;
}

Example:

k = 2
Y[2] = Y[0] + Y[2] = 1 + 9 = 10
Y[3] = Y[1] + Y[3] = 6 + 6 = 12
Y[4] = Y[2] + Y[4] = 10 + 5 = 15

Above the 10 + 5 = 15 should be 9 + 5 = 14, but the Y[2] was overwritten by another thread. I want to use that Y[2] what was before the for-loop started.

Example:

k = 4
Y[4] = Y[0] + Y[4] = 1 + 15 = 16

Result: 1, 6, 10, 12, 16. Expected good result: 1, 6, 10, 12, 15.

Upvotes: 1

Views: 1220

Answers (2)

Cluner
Cluner

Reputation: 13

UPDATE for Stage III.

int num_threads = 8;
int k = 2;
while (k < p)
{
    #pragma omp parallel for ordered num_threads(k < num_threads ? 1 : num_threads)
    for (i = p - 1; i >= k; i--)
    {
        Y[i] = Y[i - k] + Y[i];
    }
    k += k;
}

The code above solved my problem. It's now working with parallel, except the first few round.

Upvotes: 0

John Bollinger
John Bollinger

Reputation: 180371

Above the 10 + 5 = 15 should be 9 + 5 = 14, but the Y[2] was overwritten by another thread. I want to use that Y[2] what was before the for-loop started.

With OpenMP, you always have to consider whether your code is correct for the serial case, with a single thread, because

  1. It might in fact run that way, and
  2. If it's incorrect serially, then it's virtually certain to be incorrect as a parallel program, too.

Your code is not correct serially. It appears you could fix that by running the problem loop backward, from i = p - 1 to k, but in fact that's not sufficient for parallel operation.

Your best bet appears to be to accumulate your partial results into a different array than holds the results of the previous cycle. For example, you might flip between X and Y as data source and result, with a little pointer wrangling to grease the iterative wheels. Or you might do it a little more easily by using a 2D array instead of separate X and Y.

Upvotes: 1

Related Questions