Richard
Richard

Reputation: 513

OpenMP: for loop with changing number of iterations

I would like to use OpenMP to make my program run faster. Unfortunately, the opposite is the case. My code looks something like this:

const int max_iterations = 10000;
int num_interation = std::numeric_limits<int>::max();

#pragma omp parallel for
for(int i = 0; i < std::min(num_interation, max_iterations); i++)
{
  // do sth.

  // update the number of required iterations
  // num_interation can only become smaller over time
  num_interation = update_iterations(...);
}

For some reason, many more iterations are processed than required. Without OpenMP, it takes 500 iterations on avarage. However, even when setting the numbers of threads to one (set_num_threads(1)), it computes more than one thousand iterations. The same happens if I use mutliple threads, and also when using a writelock when updating num_iterations.

I would assume that it has something todo with memory bandwidth or a race condition. But those problems should not appear in case of set_num_threads(1).

Therefore, I assume that it could have something todo with the scheduling and the chunk size. However, I am really not sure about this.

Can somebody give me a hint?

Upvotes: 0

Views: 1323

Answers (1)

Gilles
Gilles

Reputation: 9489

A quick answer for the behaviour you experience is given by the OpenMP standard page 56:

The iteration count for each associated loop is computed before entry to the outermost loop. If execution of any associated loop changes any of the values used to compute any of the iteration counts, then the behavior is unspecified.

In essence, this means that you cannot modify the boundaries of your loop once you entered it. Although according to the standard the behaviour is "unspecified", in your case, what happen is quite clear since as soon as you switch OpenMP on on your code, you compute the number of iterations you had specified initially.

So you have to take another approach to this problem.

This is a possible solution (amongst many other) which I hope scales OK. It has the drawback of potentially allowing more iterations to happen than the number you intended (up to OMP_NUM_THREADS-1 more iterations than expected, assuming that //do sth. is balanced, and many more if not). Also, it assumes that update_iterations(...) is thread safe and can be called in parallel without unwanted side effects... This is a very strong assumption which you'd better enforce!

num_interation = std::min(num_interation, max_iterations);
#pragma omp parallel
{
    int i = omp_get_thread_num();
    const int nbth = omp_get_num_threads();
    while ( i < num_interation ) {
        // do sth.

        // update the number of required iterations
        // num_interation can only become smaller over time
        int new_num_interation = update_iterations(...);
        #pragma omp critical
        num_interation = std::min(num_interation, new_num_interation);
        i += nbth;
    }
}

A more synchronised solution, if the //do sth. isn't so balanced and not doing too many extra iterations is important, could be:

num_interation = std::min(num_interation, max_iterations);
int nb_it_done = 0;
#pragma omp parallel
{
    int i = omp_get_thread_num();
    const int nbth = omp_get_num_threads();
    while ( nb_it_done < num_interation ) {
        // do sth.

        // update the number of required iterations
        // num_interation can only become smaller over time
        int new_num_interation = update_iterations(i);
        #pragma omp critical
        num_interation = std::min(num_interation, new_num_interation);
        i += nbth;
        #pragma omp single
        nb_it_done += nbth;
    }
}

Another weird thing here is that, since you didn't show what i is used for, it isn't clear if iterating somewhat randomly into the domain is a problem. If it isn't, the first solution should work well, even for unbalanced //do sth.. But if it is a problem, then you'd better stick with the second solution (and even potentially reinforce the synchronism).

But at the end of the day, there is now way (that I can think of and with decent parallelism) to avoid potential extra work to be done, since the number of iterations can change along the way.

Upvotes: 2

Related Questions