pauliwago
pauliwago

Reputation: 6685

OpenMP Nested Loop, what is the code doing?

I'm having issues understanding how OpenMP works with nested loops. Please help!

I got the following code to run in parallel:

#pragma omp parallel private(i)
{
for(i=0; i<n; i++)
{
    #pragma omp for
    for(j=0; j<n; j++)
    {
        if(asubsref(seed,j) > 0)
            asubsref(bin,j) = asubsref(bin,j) + 1;
    }
    #pragma omp for
    for(j=0; j<n; j++)
        asubsref(seed,j) = asubsref(seed,j) - asubsref(w,i);
}
}

However, I'm not quite sure how this code works (I just got it by luck). Here's what I think it's doing...

So for(i=0; i<n; i++) is split off into different threads and run in parallel. Because i was declared to be private, each instance of the loop is "sandboxed"; that is, any changes to j remain in that thread (at least until all the threads are done?). I'm confused because not declaring #pragma omp for causes the code to break...I'm not sure why this is.

If someone could walk me through what this code is doing, I'd appreciate it very much! Thank you!

Upvotes: 0

Views: 387

Answers (1)

Hristo Iliev
Hristo Iliev

Reputation: 74445

This is what one typically does in order to reduce the overhead of multiple enters and exists into and out of a parallel region. Compare the code from your question to the following equivalent code:

for (i=0; i<n; i++)
{
    #pragma omp parallel for
    for (j=0; j<n; j++)
    {
        if (asubsref(seed,j) > 0)
            asubsref(bin,j) = asubsref(bin,j) + 1;
    }
    #pragma omp parallel for
    for (j=0; j<n; j++)
        asubsref(seed,j) = asubsref(seed,j) - asubsref(w,i);
}

It runs the outer loop over i n times. In each iteration of the outer loop, it runs two parallel loops over j, splitting the n iterations among the threads. The problem here is that you have two parallel regions inside, each of which is activated n times. This could add significant overhead for a certain interval of values of n (the actual interval would depend on many factors).

In order to reduce the overhead, one puts the whole code inside a parallel region. Thus each thread would execute all the iterations of the outer loop and the inner iterations would still be split among the threads. If you remove the worksharing constructs, then the inner loop would not get distributed among the threads. Instead, each thread would execute both the full range of outer and inner iterations, so for example asubsref(bin,j) would get incremented up to num_threads times instead of just once (why "up to"? hint: unprotected concurrent data access).

Most of the times when one runs an outer loop inside a parallel construct, one wants different threads to keep ticking on the same iterations. This could be achieved by a barrier at the end of the loop body:

#pragma omp parallel private(i)
{
    for (i = 0; i < n; i++)
    {
        ...
        #pragma omp barrier
    }
}

In your case an explicit barrier is not necessary since the for worksharing construct has an implicit barrier at its end.

Upvotes: 2

Related Questions