Reputation: 11
I was wondering is there are performance benefits to using a pool of threads over simply creating threads and allowing the OS to queue and schedule them.
Say I have 20 available threads and I have 60 tasks I want to run on these threads, say I have something like;
void someTask() {
//...performs some task
}
// say std::thread::hardware_concurrency() = 20
std::vector<std::thread> threads;
for (int i = 0; i < 60; i++) {
threads.push_back(std::thread(someFunc));
}
std::for_each(threads.begin(),threads.end(),[](std::thread& x){x.join();});
Is there a benefit to instead creating a pool with 20 threads and giving each of these another 'task' when a thread becomes free? I assume that there is some overhead in spawning a thread, but are there other benefits to creating a pool for such a problem?
Upvotes: 1
Views: 1206
Reputation:
Creating a thread takes typically 75k cycles (~20us).
Starting said thread could take 200k cycles (~60us).
Waking up a thread takes about 15k cycles (~5us).
So you can see that it is worth pre-creating threads and just waking them up instead of creating threads every time.
#include <iostream>
#include <thread>
#include <cstdint>
#include <mutex>
#include <chrono>
#include <condition_variable>
uint64_t now() {
return __builtin_ia32_rdtsc();
}
uint64_t t0 = 0;
uint64_t t1 = 0;
uint64_t t2 = 0;
uint64_t t3 = 0;
uint64_t t4 = 0;
double sum01 = 0;
double sum02 = 0;
double sum34 = 0;
uint64_t count = 0;
std::mutex m;
std::condition_variable cv;
void run() {
t1 = now();
cv.notify_one();
std::unique_lock<std::mutex> lk(m);
cv.wait(lk);
t4 = now();
}
void create_thread() {
t0 = now();
std::thread th( run );
t2 = now();
std::this_thread::sleep_for( std::chrono::microseconds(100));
t3 = now();
cv.notify_one();
th.join();
count++;
sum01 += (t1-t0);
sum02 += (t2-t0);
sum34 += (t4-t3);
}
int main() {
const uint32_t numloops = 10;
for ( uint32_t j=0; j<numloops; ++j ) {
create_thread();
}
std::cout << "t01:" << sum01/count << std::endl;
std::cout << "t02:" << sum02/count << std::endl;
std::cout << "t34:" << sum34/count << std::endl;
}
Typical result:
Program returned: 0
t01:64614.4
t02:54655
t34:15758.4
Source: https://godbolt.org/z/recfjKe8x
Upvotes: 2