Reputation: 13
I'm trying to implement a Prefix Sum Algorithm in C using OpenMP, and I'm stuck.
#include <stdlib.h>
#include <stdio.h>
#include <omp.h>
int main(int argc, char* argv[])
{
int p = 5;
int X[5] = { 1, 5, 4, 2, 3 };
int* Y = (int*)malloc(p * sizeof(int));
for (int i = 0; i < p; i++)
printf("%d ", X[i]);
printf("\n");
Y[0] = X[0];
int i;
#pragma omp parallel for num_threads(4)
for (i = 1; i < p; i++)
Y[i] = X[i - 1] + X[i];
int k = 2;
while (k < p)
{
int i;
#pragma omp parallel for
for (i = k; i < p; i++)
Y[i] = Y[i - k] + Y[i];
k += k;
}
for (int i = 0; i < p; i++)
printf("%d ", Y[i]);
printf("\n");
system("pause");
return 0;
}
What this code should do?
Input numbers are in
X
,
output numbers are (prefixes) inY
and the number count isp
.
X = 1, 5, 4, 2, 3
Stage I.
Y[0] = X[0];
Y[0] = 1
Stage II.
int i;
#pragma omp parallel for num_threads(4)
for (i = 1; i < p; i++)
Y[i] = X[i - 1] + X[i];
Example:
Y[1] = X[0] + X[1] = 6
Y[2] = X[1] + X[2] = 9
Y[2] = X[2] + X[3] = 6
Y[4] = X[3] + X[4] = 5
Stage III. (where I am stuck)
int k = 2;
while (k < p)
{
int i;
#pragma omp parallel for
for (i = k; i < p; i++)
Y[i] = Y[i - k] + Y[i];
k += k;
}
Example:
k = 2
Y[2] = Y[0] + Y[2] = 1 + 9 = 10
Y[3] = Y[1] + Y[3] = 6 + 6 = 12
Y[4] = Y[2] + Y[4] = 10 + 5 = 15
Above the 10 + 5 = 15
should be 9 + 5 = 14
, but the Y[2]
was overwritten by another thread. I want to use that Y[2]
what was before the for-loop started.
Example:
k = 4
Y[4] = Y[0] + Y[4] = 1 + 15 = 16
Result: 1, 6, 10, 12, 16
. Expected good result: 1, 6, 10, 12, 15
.
Upvotes: 1
Views: 1220
Reputation: 13
UPDATE for Stage III.
int num_threads = 8;
int k = 2;
while (k < p)
{
#pragma omp parallel for ordered num_threads(k < num_threads ? 1 : num_threads)
for (i = p - 1; i >= k; i--)
{
Y[i] = Y[i - k] + Y[i];
}
k += k;
}
The code above solved my problem. It's now working with parallel, except the first few round.
Upvotes: 0
Reputation: 180371
Above the
10 + 5 = 15
should be9 + 5 = 14
, but theY[2]
was overwritten by another thread. I want to use thatY[2]
what was before the for-loop started.
With OpenMP, you always have to consider whether your code is correct for the serial case, with a single thread, because
Your code is not correct serially. It appears you could fix that by running the problem loop backward, from i
= p - 1
to k
, but in fact that's not sufficient for parallel operation.
Your best bet appears to be to accumulate your partial results into a different array than holds the results of the previous cycle. For example, you might flip between X
and Y
as data source and result, with a little pointer wrangling to grease the iterative wheels. Or you might do it a little more easily by using a 2D array instead of separate X and Y.
Upvotes: 1