Reputation: 186
As described in title I'd like to know whether tasks ran with std::async
can "reuse" idle threads.
For example lets take next code:
auto task = []() { std::this_thread::sleep_for(std::chrono::seconds(20)); };
int tasksCount = 160;
std::vector<std::future<void>> futures;
for (int i = 0; i < tasksCount; ++i)
{
futures.push_back(std::async(task));
}
So we have a lot of tasks (160) runned in parallel which do nothing. When this code is running on windows it generates 161 waiting threads.
Isn't it too much threads for doing nothing? Why waiting threads can't be "reused"?
Upvotes: 0
Views: 1239
Reputation: 275740
A thread, roughly, is a CPU state and reserved memory space for a stack, plus an entry in an OS scheduler. The C++ language also has information about per-thread state (thread_local
), and helper libraries may also have some state.
These are reasonably expensive. This information cannot be shared between threads; each thread actually has a different stack, a different set of thread_local
state, different register values, etc.
Now, when a thread isn't executing, it is just an entry in a table. No CPU resources (other than those caused by a larger table) are spent on the thread. So you have a large amount of setup costs, a bunch of threads are started, then they go to sleep. The scheduler doesn't return to those threads until the time they asked to sleep comes up.
So at the hardware level, they are sharing CPUs. But at the software level, their state isn't shared, and that is what you are seeing in the debugger.
Upvotes: 2
Reputation: 179991
The sharing does happen, but at core level, not thread level. Since your threads are doing virtually no computation, it's likely all 160 threads can share a single CPU core.
Fundamentally, a thread holds a call stack, with the local variables of each function invocation. This stack can't really be shared - the fundamental property of a call stack is that the top function is the one actively executing. In your example, you have 160 sleep_for
on top of 160 stacks.
Upvotes: 3
Reputation: 11430
The important question is: what observable difference would it make to your program? The standard won't talk to what happens at a lowel system level. It will only talk about observable behaviour. There's no gain there, the only observable difference could be unexpected thread local storage variables mixup.
Consider the complexity:
So, in short, it would offer no visible benefit, could break thread local storage, depending on how it is stated in the spec, and would be a major pain to implement. Only for the sake of reducing the number of threads at a lower level.
Upvotes: 1