Reputation: 889
I do not seem to understand exactly the behavior of openmp parallel constructs with nested for loops. Consider the following code:
std::size_t idx;
std::size_t idx2;
omp_set_num_threads( 2 );
#pragma omp parallel default(shared) private(idx, idx2)
{
for(std::size_t idx=0;idx<3;idx++)
{
for(std::size_t idx2=0;idx2<4;idx2++)
{
LOG("From thread "+std::to_string(omp_get_thread_num())+" idx "+std::to_string(idx)+" idx2 "+std::to_string(idx2));
}
}
}
This produces the following output:
From thread 0 idx 0 idx2 0
From thread 1 idx 0 idx2 0
From thread 0 idx 0 idx2 1
From thread 1 idx 0 idx2 1
From thread 0 idx 0 idx2 2
From thread 1 idx 0 idx2 2
From thread 0 idx 0 idx2 3
From thread 1 idx 0 idx2 3
From thread 0 idx 1 idx2 0
From thread 1 idx 1 idx2 0
From thread 0 idx 1 idx2 1
From thread 1 idx 1 idx2 1
From thread 0 idx 1 idx2 2
From thread 1 idx 1 idx2 2
From thread 0 idx 1 idx2 3
From thread 1 idx 1 idx2 3
From thread 0 idx 2 idx2 0
From thread 1 idx 2 idx2 0
From thread 0 idx 2 idx2 1
From thread 1 idx 2 idx2 1
From thread 0 idx 2 idx2 2
From thread 1 idx 2 idx2 2
From thread 0 idx 2 idx2 3
From thread 1 idx 2 idx2 3
What seems to happen above is that 2 threads are assigned to execute the two nested loops and as a result they produce the above output (2*3*4=24 log messages total), which is straightforward.
But now consider the following code where the inner for loop is declared as a pragma omp for
std::size_t idx;
std::size_t idx2;
omp_set_num_threads( 2 );
#pragma omp parallel default(shared) private(idx, idx2)
{
for(std::size_t idx=0;idx<3;idx++)
{
#pragma omp for
for(std::size_t idx2=0;idx2<4;idx2++)
{
LOG("From thread "+std::to_string(omp_get_thread_num())+" idx "+std::to_string(idx)+" idx2 "+std::to_string(idx2));
}
}
}
This produces the following 3*4=12 log messages:
From thread 0 idx 0 idx2 0
From thread 1 idx 0 idx2 2
From thread 0 idx 0 idx2 1
From thread 1 idx 0 idx2 3
From thread 0 idx 1 idx2 0
From thread 1 idx 1 idx2 2
From thread 0 idx 1 idx2 1
From thread 1 idx 1 idx2 3
From thread 0 idx 2 idx2 0
From thread 0 idx 2 idx2 1
From thread 1 idx 2 idx2 2
From thread 1 idx 2 idx2 3
I would have expected again two threads to be assigned to the code corresponding to the two inner for loops and get again 24 output messages. Why is the output different in these two cases?
Upvotes: 3
Views: 1214
Reputation:
In the first case #pragma omp parallel
runs the entire parallel region once on each thread. This means both threads will run both for loops entirely, so each thread should generate 4*3=12 lines of output.
In the second case, the inner #pragma omp for
tells the computer that the inner for loop on idx2
should be split among available threads. So instead of both threads executing the inner loop from 0 to idx2
, each iteration of the inner loop will be executed exactly once.
In the second output we should see all values of idx2
being printed exactly once for each value of idx
and from whatever thread happened to be available.
e.g. if idx
could only be zero the output might look something like:
From thread ? idx 0 idx2 0
From thread ? idx 0 idx2 1
From thread ? idx 0 idx2 2
From thread ? idx 0 idx2 3
where ?
means it could be any available thread.
Upvotes: 2