Reputation:
I have some simple rendering program with a Mainloop
that runs at about 8000 fps on one thread (it does nothing except draw a background) and I wanted to see if another thread rendering would upset the current context without changing it (it didn't to my surprise). I achieved this with this simple code here,
m_Thread = std::thread(Mainloop);
m_Thread.join();
and this code here somehow ran extremely slow, ~30 FPS. I thought this was weird and I remembered in another project I used std::future
for a similar performance-based reason. So I then tried it with std::future
using the following code:
m_Future = std::async(std::launch::async, Mainloop);
m_Future.get();
and this runs just a tiny bit below the single-threaded performance (~7900) fps. Why is std::thread
so much slower than std::future
?
Disregard the above code, here is a minimal reproducable example, just toggle THREAD
to be either 0
or 1
to compare:
#include <future>
#include <chrono>
#include <Windows.h>
#include <iostream>
#include <string>
#define THREAD 1
static void Function()
{
}
int main()
{
std::chrono::high_resolution_clock::time_point start = std::chrono::high_resolution_clock::now();
std::chrono::high_resolution_clock::time_point finish = std::chrono::high_resolution_clock::now();
long double difference = 0;
long long unsigned int fps = 0;
#if THREAD
std::thread worker;
#else
std::future<void> worker;
#endif
while (true)
{
//FPS
finish = std::chrono::high_resolution_clock::now();
difference = std::chrono::duration_cast<std::chrono::nanoseconds>(finish - start).count();
difference = difference / 1000000000;
if (difference > 0.1) {
start = std::chrono::high_resolution_clock::now();
std::wstring fpsStr = L"Fps: ";
fpsStr += std::to_wstring(fps);
SetConsoleTitle(fpsStr.c_str());
fps = 0;
}
#if THREAD
worker = std::thread(Function);
worker.join();
#else
worker = std::async(std::launch::async, Function);
worker.get();
#endif
fps += 10;
}
return 0;
}
Upvotes: 1
Views: 1218
Reputation: 275878
In some versions of MSVC C++ standard library, std::async
pulls from a (system) thread pool, while std::thread
does not. This can cause problems, because I have exhausted it in the past and gotten deadlocks. It also means that casual use is faster.
My advice is to write your own thread pool on top of std::thread
and use that. You'll have full control over how many threads you have active.
This is a hard problem to get right, but depending on someone else solving it doesn't work, because honestly the standard library implementations I have used does not reliably solve it.
Note that in an N-sized thread pool, a blocking dependency chain of size N will deadlock. If you make the number of threads be the number of CPUs and don't reuse the calling thread reliably, you'll find multithreaded code tested in 4+ core machines often deadlock on 2 core machines.
At the same time, if you make a thread pool for each task, and they stack, you'll end up thrashing the CPU.
Note that the standard is annoyingly vague about how many threads you can actually expect to run. While std async has to behave "as if" you made a new std thread, in practice that just means they have to reinitialize and destroy any thread_local
objects.
There are eventual progress guarantees in the standard, but I have seen them violated in actual implementations when using std::async
. So I now avoid using it directly.
Upvotes: 0
Reputation: 6594
The std::async
can be implemented in different ways. For example there can be a pre-allocated pool of threads, and each time you use the std::async
in a loop you just reuse a "hot" thread from the pool.
The std::thread
creates a new system thread object each time you use it. That may be a significant overhead to compare to reusing a thread from the pool.
I would advise you to test your code in a multithreaded environment where std::async
may start competing for the pre-allocated system objects.
Upvotes: 0