I would like to "nest" parallel for using OpenMP. Here is a toy code: #include <iostream> #include <cmath> void subproblem(int m) { #pragma omp parallel for for (int j{0}; j < m; ++j) { double sum{0.0}; for (int k{0}; k < 10000000; ++k) { sum += std::cos(static_cast<double>(k)); } #pragma omp critical { std::cout << "Sum: " << sum << std::endl; } } } int main(int argc, const char *argv[]) { int n{2}; int m{8}; #pragma omp parallel for for (int i{0}; i < n; ++i) { subproblem(m); } return 0; } Here is what I want: If n >= (number of cores on my machine), I want only the first loop to be parallelized. If n < (number of cores on my machine), I want OpenMP to launch thread in the inner loop, but I don't want the total number of threads to exceed the number of cores on my machine. So far, I have only found a solution that disables nested parallelism or always allow it, but I am looking at a way to enable it only if the number of threads launched is below the number of cores. Is there an OpenMP solution for that using tasks?

Reputation: 6255

OpenMP and nested parallelism

I would like to "nest" parallel for using OpenMP. Here is a toy code:

#include <iostream>
#include <cmath>

void subproblem(int m) {
  #pragma omp parallel for
  for (int j{0}; j < m; ++j) {
    double sum{0.0};
    for (int k{0}; k < 10000000; ++k) {
      sum += std::cos(static_cast<double>(k));
    }
    #pragma omp critical
    { std::cout << "Sum: " << sum << std::endl; }
  }
}

int main(int argc, const char *argv[]) {
  int n{2};
  int m{8};

  #pragma omp parallel for
  for (int i{0}; i < n; ++i) {
    subproblem(m);
  }

  return 0;
}

Here is what I want:

If n >= (number of cores on my machine), I want only the first loop to be parallelized.
If n < (number of cores on my machine), I want OpenMP to launch thread in the inner loop, but I don't want the total number of threads to exceed the number of cores on my machine.

So far, I have only found a solution that disables nested parallelism or always allow it, but I am looking at a way to enable it only if the number of threads launched is below the number of cores.

Is there an OpenMP solution for that using tasks?

Upvotes: 3

Answers (4)

Justin Wrobel

Reputation: 2039

Does taskloop address your unsimplified issue? The third code block from this page shows it in use. and below is your code updated with it:

#include <iostream>
#include <cmath>

void subproblem(int m) {
  #pragma omp taskloop
  for (int j{0}; j < m; ++j) {
    double sum{0.0};
    for (int k{0}; k < 10000000; ++k) {
      sum += std::cos(static_cast<double>(k));
    }
    #pragma omp critical
    { std::cout << "Sum: " << sum << std::endl; }
  }
}

int main(int argc, const char *argv[]) {
  int n{2};
  int m{8};

  #pragma omp parallel for
  for (int i{0}; i < n; ++i) {
    subproblem(m);
  }

  return 0;
}

Upvotes: 0

Gilles

Reputation: 9519

Doesn't the if clause of the parallel construct just do it all for you? Here is what the 4.0 OpenMP standard says on page 44:

The syntax of the parallel construct is as follows:

#pragma omp parallel [clause[ [, ]clause] ...] new-line structured-block

where clause is one of the following:
  if(scalar-expression)
  num_threads(integer-expression)
  default(shared | none)
  private(list)
  firstprivate(list)
  shared(list)
  copyin(list)
  reduction(redution-identifier:list)
  proc_bind(master | close | spread)

I didn't try, but I guess that using the if clause just the way you described your two bullet points for whether n is greater than the number of cores on your machine might just work... Would you care to give it a try and let us know?

Upvotes: 1

Wyzard

Reputation: 34591

Rather than using a pair of nested parallel sections, you can tell OpenMP to "collapse" the nested loops into a single parallel section over the n*m iteration space:

#pragma omp parallel for collapse(2)
for (int i{0}; i < n; ++i) {
  for (int j{0}; j < m; ++j) {
    // ...
  }
}

This will allow it to divide the work appropriately regardless of the relative values of n and m.

Upvotes: 5

Roman Zaitsev

Reputation: 1458

OMP_NUM_THREADS - Specifies the default number of threads to use in parallel regions. The value of this variable shall be a comma-separated list of positive integers; the value specified the number of threads to use for the corresponding nested level. If undefined one thread per CPU is used. (from here)

omp_get_max_threads - maximum number of threads that are available to do work (from here)

omp_get_num_threads - number of threads in the current team (from here)

But AFAIK there is no function to get number of all running threads ( it's what you request:

I don't want the total number of threads to exceed the number of cores on my machine

)

Also look at this question

Upvotes: 1

OpenMP and nested parallelism

Answers (4)

Related Questions