Sigma-Cosine
Sigma-Cosine

Reputation: 21

OpenMP custom scheduling

I have a parallel loop that can be parallelised easily. I am required to implement a custom scheduling option to the loop. To do this I will explicitly assign iterations to threads rather than use the standard parallel for region as in:

#pragma omp parallel for

Instead I just use:

#pragama omp parallel [declarations]

My code so far is as follows:

#define N 500
#pragma omp parallel 
{
int num_threads = omp_get_num_threads();
int thread = omp_get_thread_num();
int start = N*thread/num_threads;
int end = N*(thread+1)/num_threads;
   for (i=start; i<end; i++){
   /* LOOP */
   }
}

I need to apply a scheduling algorithm such that each thread is assigned a local set of iterations, which I believe I have already done. Each local set is split up into chunks which each thread executes. When the thread completes it's chunk then it finds the thread which has the most remaining chunks and begins executing those chunks. This process is repeated until no chunks remain. I'm struggling to get my head around how to begin this as I'm not entirely sure how to find out what the most loaded thread is and also how to I go about splitting the local set of iterations into chunks, whilst still remaining in the parallel region.

Upvotes: 2

Views: 830

Answers (1)

dlasalle
dlasalle

Reputation: 1725

What you need to implement is a work stealing algorithm. It sounds like what you want is to assign each thread a double-ended queue (sometimes called a dequeue). If you're allowing any thread to steal work from any other thread, you will want a lock associated with each queue to prevent two threads from attempt to steal the same piece of work.

You will also want to put each of these queues into a priority queue using their size as the key, in order to allow finished threads to get the thread with the most work remaining to steal from. You will also want a lock for that global priority queue, so as to maintain its integrity as updates are performed.

All that being said, when usuable, a dynamic schedule for #pragma omp for or spawning tasks (via #pragma omp task) can be preferable. If you do implement your own work stealing algorithm, you keep in mind choosing which thread to steal from based on a metric which will be the same for all threads will often lead to collision (i.e., all idle threads will attempt to steal from the same queue). Consider choosing where to steal work based on locality instead.

Upvotes: 1

Related Questions