user11383585
user11383585

Reputation:

std::thread runs A LOT slower than std::future

I have some simple rendering program with a Mainloop that runs at about 8000 fps on one thread (it does nothing except draw a background) and I wanted to see if another thread rendering would upset the current context without changing it (it didn't to my surprise). I achieved this with this simple code here,

m_Thread = std::thread(Mainloop);
m_Thread.join();

and this code here somehow ran extremely slow, ~30 FPS. I thought this was weird and I remembered in another project I used std::future for a similar performance-based reason. So I then tried it with std::future using the following code:

m_Future = std::async(std::launch::async, Mainloop);
m_Future.get();

and this runs just a tiny bit below the single-threaded performance (~7900) fps. Why is std::thread so much slower than std::future?

Edit:

Disregard the above code, here is a minimal reproducable example, just toggle THREAD to be either 0 or 1 to compare:

#include <future>
#include <chrono>
#include <Windows.h>
#include <iostream>
#include <string>

#define THREAD 1

static void Function()
{
    
}

int main()
{
    std::chrono::high_resolution_clock::time_point start = std::chrono::high_resolution_clock::now();
    std::chrono::high_resolution_clock::time_point finish = std::chrono::high_resolution_clock::now();
    long double difference = 0;
    long long unsigned int fps = 0;

#if THREAD
    std::thread worker;
#else
    std::future<void> worker;
#endif

    while (true)
    {
        //FPS 
        finish = std::chrono::high_resolution_clock::now();
        difference = std::chrono::duration_cast<std::chrono::nanoseconds>(finish - start).count();
        difference = difference / 1000000000;
        if (difference > 0.1) {
            start = std::chrono::high_resolution_clock::now();
            std::wstring fpsStr = L"Fps: ";
            fpsStr += std::to_wstring(fps);
            SetConsoleTitle(fpsStr.c_str());
            fps = 0;
        }
        
#if THREAD
        worker = std::thread(Function);
        worker.join();
#else
        worker = std::async(std::launch::async, Function);
        worker.get();
#endif

        fps += 10;
    }

    return 0;
}

Upvotes: 1

Views: 1218

Answers (2)

Yakk - Adam Nevraumont
Yakk - Adam Nevraumont

Reputation: 275878

In some versions of MSVC C++ standard library, std::async pulls from a (system) thread pool, while std::thread does not. This can cause problems, because I have exhausted it in the past and gotten deadlocks. It also means that casual use is faster.

My advice is to write your own thread pool on top of std::thread and use that. You'll have full control over how many threads you have active.

This is a hard problem to get right, but depending on someone else solving it doesn't work, because honestly the standard library implementations I have used does not reliably solve it.

Note that in an N-sized thread pool, a blocking dependency chain of size N will deadlock. If you make the number of threads be the number of CPUs and don't reuse the calling thread reliably, you'll find multithreaded code tested in 4+ core machines often deadlock on 2 core machines.

At the same time, if you make a thread pool for each task, and they stack, you'll end up thrashing the CPU.

Note that the standard is annoyingly vague about how many threads you can actually expect to run. While std async has to behave "as if" you made a new std thread, in practice that just means they have to reinitialize and destroy any thread_local objects.

There are eventual progress guarantees in the standard, but I have seen them violated in actual implementations when using std::async. So I now avoid using it directly.

Upvotes: 0

Dmitry Kuzminov
Dmitry Kuzminov

Reputation: 6594

The std::async can be implemented in different ways. For example there can be a pre-allocated pool of threads, and each time you use the std::async in a loop you just reuse a "hot" thread from the pool.

The std::thread creates a new system thread object each time you use it. That may be a significant overhead to compare to reusing a thread from the pool.

I would advise you to test your code in a multithreaded environment where std::async may start competing for the pre-allocated system objects.

Upvotes: 0

Related Questions