Reputation: 13535
I am not sure why OpenMP uses so many threads. It seems to be not related to the Microsoft implementation because I have also tried the Intel library which shows the same behavior. I have some parallel sections in my code which are compute bound and should not create and use more threads than I have cores. But what I have observed is that for n initiating threads OpenMP creates n*Cores threads. This looks like a big thread leak to me.
If I execute a "small" 32 bit application running on a server it can fail because 1000 OpenMP threads need 2 GB of address space already leaving no memory for the application. That should not happen. I would expect from an state of the art thread pool to reuse its threads and to take away no longer used threads.
I have tried to use omp_set_num_threads(8) to limit the thread pool size to 8 cores but that seems to only limit the number of threads per initiating thread instance. Am I doing all wrong or is OpenMP not to be meant to be used that way?
On my 8 core machine 5 started threads in my AsyncWorker class will allocate 38 threads created by OpenMP. I would expect only 8 threads to be created and these should be reused across all 5 initiating threads.
#include <atomic>
#include <thread>
#include <omp.h>
#include <chrono>
#include <vector>
#include <memory>
class AsyncWorker {
private:
std::vector<std::thread> threads;
public:
AsyncWorker()
{
}
void start() // add one thread that starts an OpenMP parallel section
{
threads.push_back(std::thread(&AsyncWorker::threadFunc, this));
}
~AsyncWorker()
{
for (auto &t : threads)
{
t.join();
}
}
private:
void threadFunc()
{
std::atomic<int> counter;
auto start = std::chrono::high_resolution_clock::now();
std::chrono::milliseconds durationInMs;
while (durationInMs.count() <5000l)
{
// each instance seems to get its own thread pool.
// Why? And how can I limit the threadpool to the number of cores and when will the threads be closed?
#pragma omp parallel
{
counter++;
auto stop = std::chrono::high_resolution_clock::now();
durationInMs = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start);
}
}
}
};
int main() {
//omp_set_dynamic(0);
//omp_set_nested(0);
//omp_set_num_threads(8);
{
AsyncWorker foo;
foo.start(); // 1
foo.start(); // 2
foo.start(); // 3
foo.start(); // 4
foo.start(); // 5
system("pause");
}
return 0;
}
Upvotes: 3
Views: 4229
Reputation: 74365
OpenMP is not meant to be used that way. Mixing OpenMP and other threading methods is a recipe for disaster unless done very carefully. And even then, the results are unpredictable. The OpenMP standard deliberately stays away from defining such kind of interoperability and the vendors are free to provide it the way they see fit (if they see fit).
omp_set_num_threads(8)
does not do what you think it does. It sets the number of threads for parallel regions encountered by the current thread when no num_threads()
clause is present. Also, omp_set_nested(0)
has (or might have) no effect since you are not starting the parallel regions from OpenMP threads but rather from C++11 threads. Setting a global limit on the total number of OpenMP threads is possible via the OMP_THREAD_LIMIT
environment variable, but that is only available in OpenMP 3.0 and later and MSVC is (forever?) stuck in the OpenMP 2.0 era.
Possible courses of action are:
Upvotes: 5
Reputation: 138
The number of threads OpenMP uses is set per parallel section, and you are concurrently spawning 5 parallel sections. Therefore you get 40 Threads.
It seems to be that you are looking for task based parallelism. In OpenMP you could achieve that by starting a parallel region and then creating tasks as needed. From the top of my head code for this pattern is written like this:
// Start parallel region
#pragma omp parallel
{
// Only let a single thread create the tasks
#pragma omp single
{
for(int i = 0; i < 40; i++)
{
// Actually create the task that needs to be performed
#pragma omp task
{
heavy_work();
}
}
}
}
This way you would only have 8 Threads working in parallel.
Upvotes: 3