OpenMP nesting not turning off

Question

I'm trying to manage nested parallel regions with OpenMP (4.5, via GCC 7.2.0) and I'm having some issues turning off nesting.

Sample program:

#include 
#include 

void foobar() {
  int tid = omp_get_thread_num();
  #pragma omp parallel for
  for (int i = 0; i < 4; i++) {
    int otid = omp_get_thread_num();
    printf("%d | %d
", tid, otid);
  }
}

int main(void) {
  omp_set_nested(0);
  #pragma omp parallel
  {
    foobar();
  }
  printf("
");
  foobar();
  return 0;
}

What I'm expecting to happen here is both the parallel region and non-parallel call on foobar() will spit out 4 lines, something to the tune of

// parallel region foobar()
0 | 0
1 | 1
2 | 2
3 | 3
// serial region foobar()
0 | 0
0 | 1
0 | 2
0 | 3

As I am not allowing nested parallelism. However, I get 16 lines within the parallel region with the correct TID, but the OTID is always 0 (i.e. every thread is spawning 4 of its own, and executing the entire loop on that) and I get 4 lines outside (i.e. the parallel for is spawning 4 threads as I would expect)

I feel like I'm missing something very obvious here, can anybody shed some light for me? Isn't disabling nesting supposed to turn that omp parallel for into a regular omp for, and distribute the work accordingly?

Gilles · Accepted Answer

Your issue comes from the false assumption that the omp for directive will be interpreted and the corresponding work distributed among the threads irrespective of which parallel region is active. Unfortunately, in your code, the omp for is only associated with the parallel region that is declared in function foobar(). Therefore, when this region is activated (meaning since you disabled the nested parallelism, when foobar() isn't called from another parallel region) your loop will be distributed among the newly spawn threads. But when it isn't, because foobar() is called from another parallel region, then the omp for is ignored and the loop isn't distributed among the calling threads. So each and every one of them executes the whole loop, leading to the replication of printf() that you see.

A possible solution would be something like this:

#include 
#include 

void bar(int tid) {
  #pragma omp for
  for (int i = 0; i < 4; i++) {
    int otid = omp_get_thread_num();
    printf("%d | %d
", tid, otid);
  }
}

void foobar() {
  int tid = omp_get_thread_num();
  int in_parallel = omp_in_parallel();
  if (!in_parallel) {
    #pragma omp parallel
    bar(tid);
  }
  else {
    bar(tid);
  }
}

int main() {
  #pragma omp parallel
  foobar();
  printf("
");
  foobar();
  return 0;
}

I don't really find this solution entirely satisfying, but I don't see any better one right now. Maybe later will I get some enlightenment...

EDIT: well I had another idea: doing it the other way around and forcing the nested parallelism, with only one single active thread whenever the function was called from an actual parallel region:

#include 
#include 

void foobar() {
  int tid = omp_get_thread_num();
  omp_set_nested(1);
  #pragma omp single
  #pragma omp parallel for
  for (int i = 0; i < 4; i++) {
    int otid = omp_get_thread_num();
    printf("%d | %d
", tid, otid);
  }
}

int main() {
  #pragma omp parallel
  foobar();
  printf("
");
  foobar();
  return 0;
}

And this time the code looks much nicer without any duplication, and gives (for example):

$ OMP_NUM_THREADS=4 ./nested
3 | 2
3 | 3
3 | 1
3 | 0

0 | 3
0 | 1
0 | 0
0 | 2

OpenMP nesting not turning off

Answers (1)

Related Questions