astrophobia
astrophobia

Reputation: 889

behavior of pragma omp parallel with for loops

I do not seem to understand exactly the behavior of openmp parallel constructs with nested for loops. Consider the following code:

std::size_t idx;
std::size_t idx2;
omp_set_num_threads( 2 );

#pragma omp parallel default(shared) private(idx, idx2)
{

  for(std::size_t idx=0;idx<3;idx++)
  {
    for(std::size_t idx2=0;idx2<4;idx2++)
    {
      LOG("From thread "+std::to_string(omp_get_thread_num())+" idx "+std::to_string(idx)+" idx2 "+std::to_string(idx2));
    }
  }
}

This produces the following output:

From thread 0 idx 0 idx2 0
From thread 1 idx 0 idx2 0
From thread 0 idx 0 idx2 1
From thread 1 idx 0 idx2 1
From thread 0 idx 0 idx2 2
From thread 1 idx 0 idx2 2
From thread 0 idx 0 idx2 3
From thread 1 idx 0 idx2 3
From thread 0 idx 1 idx2 0
From thread 1 idx 1 idx2 0
From thread 0 idx 1 idx2 1
From thread 1 idx 1 idx2 1
From thread 0 idx 1 idx2 2
From thread 1 idx 1 idx2 2
From thread 0 idx 1 idx2 3
From thread 1 idx 1 idx2 3
From thread 0 idx 2 idx2 0
From thread 1 idx 2 idx2 0
From thread 0 idx 2 idx2 1
From thread 1 idx 2 idx2 1
From thread 0 idx 2 idx2 2
From thread 1 idx 2 idx2 2
From thread 0 idx 2 idx2 3
From thread 1 idx 2 idx2 3

What seems to happen above is that 2 threads are assigned to execute the two nested loops and as a result they produce the above output (2*3*4=24 log messages total), which is straightforward.

But now consider the following code where the inner for loop is declared as a pragma omp for

std::size_t idx;
std::size_t idx2;    
omp_set_num_threads( 2 );

#pragma omp parallel default(shared) private(idx, idx2)
{

  for(std::size_t idx=0;idx<3;idx++)
  {
    #pragma omp for
    for(std::size_t idx2=0;idx2<4;idx2++)
    {
      LOG("From thread "+std::to_string(omp_get_thread_num())+" idx "+std::to_string(idx)+" idx2 "+std::to_string(idx2));
    }
  }
}

This produces the following 3*4=12 log messages:

From thread 0 idx 0 idx2 0
From thread 1 idx 0 idx2 2
From thread 0 idx 0 idx2 1
From thread 1 idx 0 idx2 3
From thread 0 idx 1 idx2 0
From thread 1 idx 1 idx2 2
From thread 0 idx 1 idx2 1
From thread 1 idx 1 idx2 3
From thread 0 idx 2 idx2 0
From thread 0 idx 2 idx2 1
From thread 1 idx 2 idx2 2
From thread 1 idx 2 idx2 3

I would have expected again two threads to be assigned to the code corresponding to the two inner for loops and get again 24 output messages. Why is the output different in these two cases?

Upvotes: 3

Views: 1214

Answers (1)

user9614249
user9614249

Reputation:

In the first case #pragma omp parallel runs the entire parallel region once on each thread. This means both threads will run both for loops entirely, so each thread should generate 4*3=12 lines of output.

In the second case, the inner #pragma omp for tells the computer that the inner for loop on idx2 should be split among available threads. So instead of both threads executing the inner loop from 0 to idx2, each iteration of the inner loop will be executed exactly once.

In the second output we should see all values of idx2 being printed exactly once for each value of idx and from whatever thread happened to be available.

e.g. if idx could only be zero the output might look something like:

From thread ? idx 0 idx2 0
From thread ? idx 0 idx2 1
From thread ? idx 0 idx2 2
From thread ? idx 0 idx2 3

where ? means it could be any available thread.

Upvotes: 2

Related Questions