Dynamic/Nested Parallelism of GPU with OpenMP programming model

Question

I've question is related with declare target construct of OpenMP and dynamic/nested parallelism feature of GPUs. OpenACC 2.0 supports dynamic parallelism in two ways; routine directive and using parallel/kernels directives in nested way. But using nested device directives such as target, teams or distribute is prohibited in OpenMP. But declare target directive seems to me similar with routine directive of OpenACC.

However it's not still clear semantic and its usage. My sample code is like this.

#pragma omp declare target
void foo(){

  //work-1

#pragma omp target teams distribute parallel for 
for(...)
{
    if(...){
       foo();
    {
      //work-2
    }
}

//work-3
}
#pragma omp end declare target

int main(){
//work 

foo();

}

In that example, I am expecting that function foo will be mapped as device function. but since it has target construct inside, compiler will transform this code in some way. My question is here, what will happen when the threads encounters foo function invocation inside of target construct?

Andrey Churbanov · Accepted Answer

Currently OpenMP does not support nesting of target regions. So your code will most likely not compile. Or it can crash at runtime when nested target construct encountered. Or produce unexpected result. Basically the result of non-conforming program is unspecified.

Dynamic/Nested Parallelism of GPU with OpenMP programming model

Answers (1)

Related Questions