Emilien
Emilien

Reputation: 2445

Openmp scheduling

I have a piece of code with two nested for loops. When the first one has few steps the second one has a lot, and the other way around. I can run both for loops with omp for directives independently and I have consistent results (and some speedup). However I'd like to:

  1. Run the first one in parallel if it has 16 steps or more
  2. Else run the second one in parallel (but not the first one even if it has 8 steps)

This is not nested parallelism because either one loop is parallel or the other. If I run them independently and run top -H to see the threads, I observe sometimes only one thread, sometimes more (in each case) so what I want to do would make sense and would actually improve performance?

So far I did something like this:

#pragma omp parallel
{
    #pragma omp for schedule(static,16)
    for(...){
        /* some declarations */
        #pragma omp for schedule(static,16) nowait
        for(...){
            /* ... */
        }
    }
}

which does not compile (work-sharing region may not be closely nested inside of work-sharing, critical, ordered, master or explicit task region) and which would not behave as I described anyway. I also tried collapse but had problems with the "/* some declarations */ ", and I'd like to avoid it since it's openmp3.0 and I'm not sure the target hardware's compiler will support this.

Any ideas?

Upvotes: 0

Views: 184

Answers (1)

Hristo Iliev
Hristo Iliev

Reputation: 74365

You cannot nest work-sharing constructs that bind to the same parallel region, but you can use nested parallelism and selectively deactivate the regions with the if(condition) clause. If condition evaluates to true at run time, then the region is active, otherwise it executes serially. It would look like this:

/* Make sure nested parallelism is enabled */
omp_set_nested(1);

#pragma omp parallel for schedule(static) if(outer_steps>=16)
for(...){
    /* some declarations */
    #pragma omp parallel for if(outer_steps<16)
    for(...){
        /* ... */
    }
}

The drawback here is that the inner region introduces a small overhead if it is not active at run time. If you desire efficiency and are ready to sacrifice maintainability for that, then you can write two different implementations of the nested loop and branch to the appropriate one based on the value of outer_steps.

Upvotes: 1

Related Questions