Reputation: 1817
See edit below for my preliminary solution
Consider the following code:
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
int main(void) {
int counter = 0;
int i;
omp_set_num_threads(8);
#pragma omp parallel
{
int id = omp_get_thread_num();
#pragma omp for private(i)
for (i = 0; i<10; i++) {
printf("id: %d thread: %d\n", i, id);
#pragma omp critical // or atomic
counter++;
}
}
printf("counter %d\n", counter);
return 0;
}
I define the number of threads to be 8. For each of the 8 threads I would like to have a for
loop for every individual threads that increments the variable counter
. However, it seems that OpenMP parallelize the for
loop:
i: 0 thread: 0
i: 1 thread: 0
i: 4 thread: 2
i: 6 thread: 4
i: 2 thread: 1
i: 3 thread: 1
i: 7 thread: 5
i: 8 thread: 6
i: 5 thread: 3
i: 9 thread: 7
counter 10
Consequently, counter=10
, but I want counter=80
. What can I do so that every threads performs its own for
loop while all threads increment counter
?
The following code gives the desired result:
I added another outer for
loop that loops from 0 to the maximal number of threads. Inside this loop I can then declare my for
loop private for each thread. Indeed, counter=80
in this case. Is this the optimal solution for this problem or is there a better one?
int main(void) {
omp_set_num_threads(8);
int mthreads = omp_get_max_threads();
#pragma omp parallel for private(i)
for (n=0; n<mthreads; n++) {
int id = omp_get_thread_num();
for (i = 0; i<10; i++) {
printf("i: %d thread: %d\n", i, id);
#pragma omp critical
counter++;
}
}
}
printf("counter %d\n", counter);
return 0;
}
Upvotes: 1
Views: 2910
Reputation: 74365
The solution is very simple - remove the worksharing construct for
:
#pragma omp parallel
{
int id = omp_get_thread_num();
for (int i = 0; i<10; i++) {
printf("id: %d thread: %d\n", i, id);
#pragma omp critical // or atomic
counter++;
}
}
Declaring i
inside the control part of the for
is part of C99 and might require that you pass the compiler an option similar to -std=c99
. Otherwise you could simply declare i
at the beginning of the block. Or you could declare it outside the region and make it private
:
int i;
#pragma omp parallel private(i)
{
int id = omp_get_thread_num();
for (i = 0; i<10; i++) {
printf("id: %d thread: %d\n", i, id);
#pragma omp critical // or atomic
counter++;
}
}
Since you are not using the value of counter
inside the parallel region, you could also use sum reduction instead:
#pragma omp parallel reduction(+:counter)
{
int id = omp_get_thread_num();
for (int i = 0; i<10; i++) {
printf("id: %d thread: %d\n", i, id);
counter++;
}
}
Upvotes: 3
Reputation: 78903
OpenMp has a concept for this, reduction
. To stay with your example
#pragma omp parallel for reduction(+:counter)
for (unsigned n=0; n<mthreads; n++) {
int id = omp_get_thread_num();
for (unsigned i = 0; i<10; i++) {
printf("i: %d thread: %d\n", i, id);
counter++;
}
}
This has the advantage not to define a critical section around the increment. OpenMp collects the total of all the different incarnations of counter
all by itself, and probably more efficiently.
This can even be formulated much simpler as
#pragma omp parallel for reduction(+:counter)
for (unsigned i=0; i<mthreads*10; i++) {
int id = omp_get_thread_num();
printf("i: %d thread: %d\n", i, id);
counter++;
}
For some compilers you probably still have to insist with a flag such as -std=c99
that you want to declare variables within the for
loop. The advantage with declaring variables as local as possible, you don't have to insist that they'd be private or things like that. And the easiest way is certainly to have OpenMp do the split of the for
-loop all by itself.
Upvotes: 2