Reputation: 477
I am trying to parallelize the following program, but don't know how to reduce on an array. I know it is not possible to do so, but is there an alternative? Thanks. (I added reduction on m which is wrong but would like to have an advice on how to do it.)
#include <iostream>
#include <stdio.h>
#include <time.h>
#include <omp.h>
using namespace std;
int main ()
{
int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13};
int S [10];
time_t start_time = time(NULL);
#pragma omp parallel for private(m) reduction(+:m)
for (int n=0 ; n<10 ; ++n ){
for (int m=0; m<=n; ++m){
S[n] += A[m];
}
}
time_t end_time = time(NULL);
cout << end_time-start_time;
return 0;
}
Upvotes: 44
Views: 57652
Reputation: 1
OpenMP program with array
#include <iostream>
#include <omp.h>
#define N 10
int main() {
double array[N];
double sum = 0.0;
for (int i = 0; i < N; i++) {
array[i] = i;
}
#pragma omp parallel for reduction(+:sum)
for (int i = 0; i < N; i++) {
sum += i * array[i];
}
return 0;
}
Upvotes: 0
Reputation: 51393
Since none of the other answers mentioned, I am adding this answer.
I am trying to parallelize the following program, but don't know how to reduce on an array. I know it is not possible to do so, but is there an > alternative?
With OpenMP 4.5 you can reduce array using pragmas, namely:
#pragma omp parallel for reduction(+:S)
A full running example:
#define S_SIZE 10
#include <stdio.h>
#include <time.h>
#include <omp.h>
int main ()
{
int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13};
int S [S_SIZE] = {0};
#pragma omp parallel for reduction(+:S[:S_SIZE])
for (int n=0 ; n<S_SIZE ; ++n ){
for (int m=0; m<=n; ++m){
S[n] += A[m];
}
}
int expected_output [] = {84, 114, 209, 303, 339, 412, 464, 487, 489, 502};
for(int i = 0; i < S_SIZE; i++){
if(S[i] == expected_output[i])
printf("%d\n", S[i]);
else
printf("ERROR! it should have been %d instead of %d\n", expected_output[i], S[i]);
}
return 0;
}
Output:
84
114
209
303
339
412
464
487
489
502
Upvotes: 15
Reputation: 81
With parallel loop, each thread will process a given subset of indexes of the loop according to the scheduler. Then the array S won't need reduction as each index n will be processed independently for the outer loop. Also there should be no problem of race condition as each thread will write in different position S[n]. So the code above will work just fine by only using the directive
#pragma omp parallel for
For the outer loop .
Upvotes: 3
Reputation: 33659
Yes it is possible to do an array reduction with OpenMP. In Fortran it even has construct for this. In C/C++ you have to do it yourself. Here are two ways to do it.
The first method makes private version of S
for each thread, fill them in parallel, and then merges them into S
in a critical section (see the code below). The second method makes an array with dimentions 10*nthreads. Fills this array in parallel and then merges it into S
without using a critical section. The second method is much more complicated and can have cache issues especially on multi-socket systems if you are not careful. For more details see this Fill histograms (array reduction) in parallel with OpenMP without using a critical section
First method
int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13};
int S [10] = {0};
#pragma omp parallel
{
int S_private[10] = {0};
#pragma omp for
for (int n=0 ; n<10 ; ++n ) {
for (int m=0; m<=n; ++m){
S_private[n] += A[m];
}
}
#pragma omp critical
{
for(int n=0; n<10; ++n) {
S[n] += S_private[n];
}
}
}
Second method
int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13};
int S [10] = {0};
int *S_private;
#pragma omp parallel
{
const int nthreads = omp_get_num_threads();
const int ithread = omp_get_thread_num();
#pragma omp single
{
S_private = new int[10*nthreads];
for(int i=0; i<(10*nthreads); i++) S_private[i] = 0;
}
#pragma omp for
for (int n=0 ; n<10 ; ++n )
{
for (int m=0; m<=n; ++m){
S_private[ithread*10+n] += A[m];
}
}
#pragma omp for
for(int i=0; i<10; i++) {
for(int t=0; t<nthreads; t++) {
S[i] += S_private[10*t + i];
}
}
}
delete[] S_private;
Upvotes: 39
Reputation: 825
I have two remarks concerning Zboson's answer:
1. Method 1 is certainly correct but the reduction loop is actually run serially, because of the #pragma omp critical which is of course necessary as the partial matrices are local to each thread and the corresponding reduction has to be done by the thread owing the matrix.
2. Method 2: The initialization loop can be moved outside the single section and therefore become parallelizable.
The following program implements array reduction using openMP v4.0 user defined reduction facility:
/* Compile with:
gcc -Wall -fopenmp -o ar ar.c
Run with:
OMP_DISPLAY_ENV=TRUE OMP_NUM_THREADS=10 OMP_NESTED=TRUE ./ar
*/
#include <stdio.h>
#include <omp.h>
struct m10x1 {int v[10];};
int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13};
struct m10x1 S = {{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}};
int n,m=0;
void print_m10x1(struct m10x1 x){
int i;
for(i=0;i<10;i++) printf("%d ",x.v[i]);
printf("\n");
}
struct m10x1 add_m10x1(struct m10x1 x,struct m10x1 y){
struct m10x1 r ={{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}};
int i;
for (i=0;i<10;i++) r.v[i]=x.v[i]+y.v[i];
return r;
}
#pragma omp declare reduction(m10x1Add: struct m10x1: \
omp_out=add_m10x1(omp_out, omp_in)) initializer( \
omp_priv={{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}} )
int main ()
{
#pragma omp parallel for reduction(m10x1Add: S)
for ( n=0 ; n<10 ; ++n )
{
for (m=0; m<=n; ++m){
S.v[n] += A[m];
}
}
print_m10x1(S);
}
This follows verbatim the complex number reduction example on page 97 of OpenMP 4.0 features.
Although the parallel version works correctly, there probably are performance issues, which I have not investigated:
Said "performance issues" are of my own making and it is completely straightforward not to introduce them:
The modified part of the code then is:
void add_m10x1(struct m10x1 * x,struct m10x1 * y){
int i;
#pragma omp parallel for
for (i=0;i<10;i++) x->v[i] += y->v[i];
}
#pragma omp declare reduction(m10x1Add: struct m10x1: \
add_m10x1(&omp_out, &omp_in)) initializer( \
omp_priv={{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}} )
Upvotes: 11
Reputation: 78306
If translating your code to Fortran, which can use arrays in OpenMP reduction operations, doesn't appeal, you could use a bunch of temporary variables. For example
int S0, S1, S2, ..., S9;
...
#pragma omp parallel for private(...) shared(S0, S1, S2, ..., S9) \
reduction(+:S0, S1, S2, ..., S9)
for ...
This leaves you with the unappealing prospect of having to write some kind of if
or case
statement to determine which of the temporaries is to be updated. If your code is just an example you want to use for learning, carry on.
But if your intention is genuinely to write a parallel prefix sum routine then search around. This is a good place to start.
Upvotes: 0