Reputation: 2893

Selectively enable an OpenMP for loop within a parallel region

Is it possible to selectively enable an openmp directive with template parameter or run time variable?

this (all threads work on the same for loop).
#pragma omp parallel
{
    #pragma omp for
    for (int i = 0; i < 10; ++i) { /*...*/ }
}
versus this (each thread works on its own for loop)
#pragma omp parallel
{
    for (int i = 0; i < 10; ++i) { /*...*/ }
}

Update (Testing if clause)

test.cpp:

#include <iostream>
#include <omp.h>

int main() {
    bool var = true;
    #pragma omp parallel 
    {
        #pragma omp for if (var)
        for (int i = 0; i < 4; ++i) {
            std::cout << omp_get_thread_num() << "\n";
        }
    }
}

Error message (g++ 6, compiled with g++ test.cpp -fopenmp)

test.cpp: In function ‘int main()’:
test.cpp:8:25: error: ‘if’ is not valid for ‘#pragma omp for’
         #pragma omp for if (var)
                         ^~

Upvotes: 2

Answers (3)

hamster on wheels

Reputation: 2893

#include <omp.h>
#include <sstream>
#include <vector>
#include <iostream>
int main() {
    constexpr bool var = false;
    int n_threads = omp_get_num_procs();
    std::cout << "n_threads: " << n_threads << "\n";
    std::vector<std::stringstream> s(omp_get_num_procs());

    #pragma omp parallel
    {
        const int thread_id = omp_get_thread_num();
        if (var) {
            #pragma omp for
            for (int i = 0; i < 8; ++i) {
                s[thread_id] << i << ", ";
            }
        } else {
            for (int i = 0; i < 8; ++i) {
                s[thread_id] << i << ", ";
            } // code duplication
        }
    }
    for (int i = 0; i < s.size(); ++i) {
        std::cout << "thread " << i << ": " 
                  << s[i].str() << "\n";
    }
}

Upvotes: 0

Zulan

Reputation: 22670

I think the idiomatic C++ solution is to hide the different OpenMP pragmas behind algorithmic overloads.

#include <iostream>
#include <sstream>
#include <vector>
#include <omp.h>

#include <type_traits>
template <bool ALL_PARALLEL>
struct impl;

template<>
struct impl<true>
{
  template<typename ITER, typename CALLABLE>
  void operator()(ITER begin, ITER end, const CALLABLE& func) {
    #pragma omp parallel
    {
      for (ITER i = begin; i != end; ++i) {
        func(i);
      }
    }
  }
};

template<>
struct impl<false>
{
  template<typename ITER, typename CALLABLE>
  void operator()(ITER begin, ITER end, const CALLABLE& func) {
    #pragma omp parallel for
    for (ITER i = begin; i < end; ++i) {
      func(i);
    }
  }
};

// This is just so we don't have to write parallel_foreach()(...)
template <bool ALL_PARALLEL, typename ITER, typename CALLABLE>
void parallel_foreach(ITER begin, ITER end, const CALLABLE& func)
{
    impl<ALL_PARALLEL>()(begin, end, func);
}

int main()
{
    constexpr bool var = false;
    int n_threads = omp_get_num_procs();
    std::cout << "n_threads: " << n_threads << "\n";
    std::vector<std::stringstream> s(omp_get_num_procs());

    parallel_foreach<var>(0, 8, [&s](auto i) {
        s[omp_get_thread_num()] << i << ", ";
    });

    for (int i = 0; i < s.size(); ++i) {
        std::cout << "thread " << i << ": " 
                  << s[i].str() << "\n";
    }
}

If you use some specific types, you can do an overload by type instead of to using the bool template parameter and iterate through elements rather than the numerical indexed loop. Note that you can use C++ random access iterators in OpenMP worksharing loops! Depending on your types you might very well be able to implement an iterator that hides everything about the internal data access form the caller.

Upvotes: 1

hamster on wheels

Reputation: 2893

Sort of a work around. Don't know if it is possible to get rid of conditionals for getting the thread id.

#include <iostream>
#include <omp.h>
#include <sstream>
#include <vector>
int main() {
    constexpr bool var = true;
    int n_threads = omp_get_num_procs();
    std::cout << "n_threads: " << n_threads << "\n";
    std::vector<std::stringstream> s(omp_get_num_procs());

    #pragma omp parallel if (var) 
    {

        const int thread_id0 = omp_get_thread_num();
        #pragma omp parallel
        {
            int thread_id1;
            if (var) {
                thread_id1 = thread_id0;
            } else {
                thread_id1 = omp_get_thread_num();
            }

            #pragma omp for
            for (int i = 0; i < 8; ++i) {
                s[thread_id1] << i << ", ";
            }
        }
    }

    for (int i = 0; i < s.size(); ++i) {
        std::cout << "thread " << i << ": " 
                  << s[i].str() << "\n";
    }
}

Output (when var == true):

n_threads: 8
thread 0: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 1: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 2: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 3: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 4: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 5: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 6: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 7: 0, 1, 2, 3, 4, 5, 6, 7,

Output (when var == false):

n_threads: 8
thread 0: 0, 
thread 1: 1, 
thread 2: 2, 
thread 3: 3, 
thread 4: 4, 
thread 5: 5, 
thread 6: 6, 
thread 7: 7,

Upvotes: 1

Selectively enable an OpenMP for loop within a parallel region

Answers (3)

Related Questions