Reputation: 2467
I've been trying to understand how OpenMP parallel for loop works when combined with critical sections and ordered directives. There are a couple of code samples which I find confusing:
1. OpenMP parallel for loop is used to initialize the array s
with the loop index i
and the thread ID. No ordered
directives or critical sections are used.
#include <stdio.h>
#include <omp.h>
#define N 10
#define CHUNKSIZE 1
int main(int argc, char* argv[])
{
int i, chunk = CHUNKSIZE;
char s[N][22];
#pragma omp parallel for shared(s,chunk) private(i) schedule(static, chunk)
for (i = 0; i < N; ++i)
{
int tid = omp_get_thread_num();
sprintf(s[i], "%d:%d", i, tid);
printf("i: %d tid: %d\n", i, tid);
}
puts("\nArray initialization order:");
for (i = 0; i < N; ++i)
puts(s[i]);
}
It prints the following:
i: 7 tid: 7
i: 4 tid: 4
i: 5 tid: 5
i: 6 tid: 6
i: 0 tid: 0
i: 8 tid: 0
i: 3 tid: 3
i: 1 tid: 1
i: 2 tid: 2
i: 9 tid: 1
Array initialization order:
0:0
1:1
2:2
3:3
4:4
5:5
6:6
7:7
8:0
9:1
I am failing to figure out why s
contains the i
indices (first number) in a strict sequence despite the absence of the ordered
directives and why printf("i: %d tid: %d\n", i, tid)
shows them in a different order?
2. Adding ordered
to the omp parallel for
clause doesn't seem to change anything unless omp ordered
is put inside the loop body.
#pragma omp parallel for shared(s,chunk) private(i) schedule(static, chunk) ordered
for (i = 0; i < N; ++i)
{
int tid = omp_get_thread_num();
sprintf(s[i], "%d:%d", i, tid);
printf("i: %d tid: %d\n", i, tid);
}
Produces the same result as before: sprintf(s[i], "%d:%d", i, tid)
initializes the array with a strict sequence of i
, whereas printf("i: %d tid: %d\n", i, tid)
prints i
in an arbitrary order.
#pragma omp parallel for shared(s,chunk) private(i) schedule(static, chunk) ordered
for (i = 0; i < N; ++i)
{
int tid = omp_get_thread_num();
sprintf(s[i], "%d:%d", i, tid);
#pragma omp ordered
printf("i: %d tid: %d\n", i, tid);
}
Now everything happens in the sequence of i
:
i: 0 tid: 0
i: 1 tid: 1
i: 2 tid: 2
i: 3 tid: 3
i: 4 tid: 4
i: 5 tid: 5
i: 6 tid: 6
i: 7 tid: 7
i: 8 tid: 0
i: 9 tid: 1
Array initialization order:
0:0
1:1
2:2
3:3
4:4
5:5
6:6
7:7
8:0
9:1
Again, I don't understand why we need to place the omp ordered
inside the loop body to enforce the order of prints wheres array initialization doesn't need that.
3. Use critical section to ensure that only one thread at a time executes the loop body:
#pragma omp parallel for shared(s,chunk) private(i) schedule(static, chunk) ordered
for (i = 0; i < N; ++i)
#pragma omp critical
{
int tid = omp_get_thread_num();
sprintf(s[i], "%d:%d", i, tid);
printf("i: %d tid: %d\n", i, tid);
}
Again, prints i
in an arbitrary order, and initializes s
in a strict order of i
:
i: 1 tid: 1
i: 4 tid: 4
i: 3 tid: 3
i: 2 tid: 2
i: 5 tid: 5
i: 0 tid: 0
i: 7 tid: 7
i: 6 tid: 6
i: 8 tid: 0
i: 9 tid: 1
Array initialization order:
0:0
1:1
2:2
3:3
4:4
5:5
6:6
7:7
8:0
9:1
This is totally bewildering since in my understanding the critical section must guarantee that sprintf
and printf
statements are executed by the same thread without any interruptions.
Any help to clear this up will be highly appreciated.
Upvotes: 0
Views: 1445
Reputation: 74385
I am failing to figure out why
s
contains thei
indices (first number) in a strict sequence despite the absence of the ordered directives and whyprintf("i: %d tid: %d\n", i, tid)
shows them in a different order?
With static scheduling there is a fixed mapping between loop iteration and thread that executes it, which is why no matter how many times you run the program, if the number of threads is kept the same, s[i]
will always be set to "i:same_thread_id"
. Printing s[]
takes place in a sequential loop outside the parallel region, hence the output is ordered. I would be more surprised if that loop were to print things out of order. As for the printf()
calls within the parallel region, you have schedule(static,1)
, which means each iteration gets executed by a different thread, and those run in arbitrary order.
Adding
ordered
to theomp parallel for
clause doesn't seem to change anything unlessomp ordered
is put inside the loop body.
That is exactly how ordered
works. There are the ordered
clause and the ordered
region. The clause modifies the behaviour of the for
worksharing construct and enables ordered execution of the denoted region inside. There are additional synchronisation requirements for ordered execution to work properly that aren't needed otherwise, which is why the clause exists. Also, the region exists so that only a part of the loop can run in order. Having the entire loop body run in order is meaningless as it is no different from sequential (non-parallel) loop execution. See this answer of mine for more details.
This is totally bewildering since in my understanding the critical section must guarantee that
sprintf
andprintf
statements are executed by the same thread without any interruptions.
Critical sections guarantee that no two threads execute the same code region simultaneously. They in no way enforce the order of the encountering threads. Since no two threads access the same element of s[]
, having a critical section in 3. changes nothing. It serialises the loop execution since no two threads can execute the body at the same time, but it doesn't make the loop run in sequential order.
Upvotes: 2