wayi
wayi

Reputation: 523

OpenMP: Divide all the threads into different groups

I will like to divide all the threads into 2 different groups, since I have two parallel tasks to run asynchronously. For example, if totally 8 threads are available, I will like 6 threads dedicated to task1, and the other 2 dedicated to task2.

How can I achieve this with OpenMP?

Upvotes: 4

Views: 3090

Answers (1)

Jonathan Dursi
Jonathan Dursi

Reputation: 50927

This is a job for OpenMP nested parallelism, as of OpenMP 3: you can use OpenMP tasks to start two independent tasks and then within those tasks, have parallel sections which use the appropriate number of threads.

As a quick example:

#include <stdio.h>
#include <omp.h>

int main(int argc, char **argv) {

    omp_set_nested(1);   /* make sure nested parallism is on */
    int nprocs = omp_get_num_procs();
    int nthreads1 = nprocs/3;
    int nthreads2 = nprocs - nthreads1;

    #pragma omp parallel default(none) shared(nthreads1, nthreads2) num_threads(2)
    #pragma omp single
    {
        #pragma omp task
        #pragma omp parallel for num_threads(nthreads1)
        for (int i=0; i<16; i++)
            printf("Task 1: thread %d of the %d children of %d: handling iter %d\n",
                        omp_get_thread_num(), omp_get_team_size(2),
                        omp_get_ancestor_thread_num(1), i);
        #pragma omp task
        #pragma omp parallel for num_threads(nthreads2)
        for (int j=0; j<16; j++)
            printf("Task 2: thread %d of the %d children of %d: handling iter %d\n",
                        omp_get_thread_num(), omp_get_team_size(2),
                        omp_get_ancestor_thread_num(1), j);
    }

    return 0;
}

Running this on an 8 core (16 hardware threads) node,

$ gcc -fopenmp nested.c -o nested -std=c99
$ ./nested
Task 2: thread 3 of the 11 children of 0: handling iter 6
Task 2: thread 3 of the 11 children of 0: handling iter 7
Task 2: thread 1 of the 11 children of 0: handling iter 2
Task 2: thread 1 of the 11 children of 0: handling iter 3
Task 1: thread 2 of the 5 children of 1: handling iter 8
Task 1: thread 2 of the 5 children of 1: handling iter 9
Task 1: thread 2 of the 5 children of 1: handling iter 10
Task 1: thread 2 of the 5 children of 1: handling iter 11
Task 2: thread 6 of the 11 children of 0: handling iter 12
Task 2: thread 6 of the 11 children of 0: handling iter 13
Task 1: thread 0 of the 5 children of 1: handling iter 0
Task 1: thread 0 of the 5 children of 1: handling iter 1
Task 1: thread 0 of the 5 children of 1: handling iter 2
Task 1: thread 0 of the 5 children of 1: handling iter 3
Task 2: thread 5 of the 11 children of 0: handling iter 10
Task 2: thread 5 of the 11 children of 0: handling iter 11
Task 2: thread 0 of the 11 children of 0: handling iter 0
Task 2: thread 0 of the 11 children of 0: handling iter 1
Task 2: thread 2 of the 11 children of 0: handling iter 4
Task 2: thread 2 of the 11 children of 0: handling iter 5
Task 1: thread 1 of the 5 children of 1: handling iter 4
Task 2: thread 4 of the 11 children of 0: handling iter 8
Task 2: thread 4 of the 11 children of 0: handling iter 9
Task 1: thread 3 of the 5 children of 1: handling iter 12
Task 1: thread 3 of the 5 children of 1: handling iter 13
Task 1: thread 3 of the 5 children of 1: handling iter 14
Task 2: thread 7 of the 11 children of 0: handling iter 14
Task 2: thread 7 of the 11 children of 0: handling iter 15
Task 1: thread 1 of the 5 children of 1: handling iter 5
Task 1: thread 1 of the 5 children of 1: handling iter 6
Task 1: thread 1 of the 5 children of 1: handling iter 7
Task 1: thread 3 of the 5 children of 1: handling iter 15

Updated: I've changed the above to include the thread ancestor; there was come confusion because there were (for instance) two "thread 1"s printed - here I've also printed the ancestor (e.g., "thread 1 of the 5 children of 1" vs "thread 1 of the 11 children of 0").

From the OpenMP standard, S.3.2.4, “The omp_get_thread_num routine returns the thread number, within the current team, of the calling thread.”, and from section 2.5, “When a thread encounters a parallel construct, a team of threads is created to execute the parallel region [...] The thread that encountered the parallel construct becomes the master thread of the new team, with a thread number of zero for the duration of the new parallel region.

That is, within each of those (nested) parallel regions, teams of threads are created which have thread ids starting at zero; but just because those ids overlap within the team doesn't mean they're the same threads. Here I've emphasized that by printing their ancestor number as well, but if the threads were doing CPU-intensive work you'd also see with monitoring tools that there were indeed 16 active threads, not just 11.

The reason why they are team-local thread numbers and not globally-unique thread numbers is pretty straightforward; it would be almost impossible to keep track of globally-unique thread numbers in an environment where nested and dynamic parallelism can happen. Say there are three teams of threads, numbered [0..5], [6,..10], and [11..15], and the middle team completes. Do we leave gaps in the thread numbering? do we interrupt all threads to change their global numbers? What if a new team is started, with 7 threads? Do we start them at 6 and have overlapping thread ids, or do we start them at 16 and leave gaps in the numbering?

Upvotes: 9

Related Questions