NPS
NPS

Reputation: 6355

What's the correct way of waiting for detached threads to finish?

Look at this sample code:

void OutputElement(int e, int delay)
{
    this_thread::sleep_for(chrono::milliseconds(100 * delay));
    cout << e << '\n';
}

void SleepSort(int v[], uint n)
{
    for (uint i = 0 ; i < n ; ++i)
    {
        thread t(OutputElement, v[i], v[i]);
        t.detach();
    }
}

It starts n new threads and each one sleeps for some time before outputting a value and finishing. What's the correct/best/recommended way of waiting for all threads to finish in this case? I know how to work around this but I want to know what's the recommended multithreading tool/design that I should use in this situation (e.g. condition_variable, mutex etc...)?

Upvotes: 3

Views: 8027

Answers (2)

Howard Hinnant
Howard Hinnant

Reputation: 218700

And now for the slightly dissenting answer. And I do mean slightly because I mostly agree with the other answer and the comments that say "don't detach, instead join."

First imagine that there is no join(). And that you have to communicate among your threads with a mutex and condition_variable. This really isn't that hard nor complicated. And it allows an arbitrarily rich communication, which can be anything you want, as long as it is only communicated while the mutex is locked.

Now a very common idiom for such communication would simply be a state that says "I'm done". Child threads would set it, and the parent thread would wait on the condition_variable until the child said "I'm done." This idiom would in fact be so common as to deserve a convenience function that encapsulated the mutex, condition_variable and state.

join() is precisely this convenience function.

But imho one has to be careful. When one says: "Never detach, always join," that could be interpreted as: Never make your thread communication more complicated than "I'm done."

For a more complex interaction between parent thread and child thread, consider the case where a parent thread launches several child threads to go out and independently search for the solution to a problem. When the problem is first found by any thread, that gets communicated to the parent, and the parent can then take that solution, and tell all the other threads that they don't need to search any more.

For example:

#include <chrono>
#include <iostream>
#include <iterator>
#include <random>
#include <thread>
#include <vector>

void OneSearch(int id, std::shared_ptr<std::mutex> mut,
                   std::shared_ptr<std::condition_variable> cv,
                   int& state, int& solution)
{
    std::random_device seed;
//     std::mt19937_64 eng{seed()};
    std::mt19937_64 eng{static_cast<unsigned>(id)};
    std::uniform_int_distribution<> dist(0, 100000000);
    int test = 0;
    while (true)
    {
        for (int i = 0; i < 100000000; ++i)
        {
            ++test;
            if (dist(eng) == 999)
            {
                std::unique_lock<std::mutex> lk(*mut);
                if (state == -1)
                {
                    state = id;
                    solution = test;
                    cv->notify_one();
                }
                return;
            }
        }
        std::unique_lock<std::mutex> lk(*mut);
        if (state != -1)
            return;
    }
}

auto findSolution(int n)
{
    std::vector<std::thread> threads;
    auto mut = std::make_shared<std::mutex>();
    auto cv = std::make_shared<std::condition_variable>();
    int state = -1;
    int solution = -1;
    std::unique_lock<std::mutex> lk(*mut);
    for (uint i = 0 ; i < n ; ++i)
        threads.push_back(std::thread(OneSearch, i, mut, cv,
                          std::ref(state), std::ref(solution)));
    while (state == -1)
        cv->wait(lk);
    lk.unlock();
    for (auto& t : threads)
        t.join();
    return std::make_pair(state, solution);
}

int
main()
{
    auto p = findSolution(5);
    std::cout << '{' << p.first << ", " << p.second << "}\n";
}

Above I've created a "dummy problem" where a thread searches for how many times it needs to query a URNG until it comes up with the number 999. The parent thread puts 5 child threads to work on it. The child threads work for awhile, and then every once in a while, look up and see if any other thread has found the solution yet. If so, they quit, else they keep working. The main thread waits until solution is found, and then joins with all the child threads.

For me, using the bash time facility, this outputs:

$ time a.out
{3, 30235588}

real    0m4.884s
user    0m16.792s
sys 0m0.017s

But what if instead of joining with all the threads, it detached those threads that had not yet found a solution. This might look like:

    for (unsigned i = 0; i < n; ++i)
    {
        if (i == state)
            threads[i].join();
        else
            threads[i].detach();
    }

(in place of the t.join() loop from above). For me this now runs in 1.8 seconds, instead of the 4.9 seconds above. I.e. the child threads are not checking with each other that often, and so main just detaches the working threads and lets the OS bring them down. This is safe for this example because the child threads own everything they are touching. Nothing gets destructed out from under them.

One final iteration can be realized by noticing that even the thread that finds the solution doesn't need to be joined with. All of the threads could be detached. The code is actually much simpler:

auto findSolution(int n)
{
    auto mut = std::make_shared<std::mutex>();
    auto cv = std::make_shared<std::condition_variable>();
    int state = -1;
    int solution = -1;
    std::unique_lock<std::mutex> lk(*mut);
    for (uint i = 0 ; i < n ; ++i)
        std::thread(OneSearch, i, mut, cv,
                          std::ref(state), std::ref(solution)).detach();
    while (state == -1)
        cv->wait(lk);
    return std::make_pair(state, solution);
}

And the performance remains at about 1.8 seconds.

There is still (sort of) an effective join with the solution-finding thread here. But it is accomplished with the condition_variable::wait instead of with join.

thread::join() is a convenience function for the very common idiom that your parent/child thread communication protocol is simply "I'm done." Prefer thread::join() in this common case as it is easier to read, and easier to write.

However don't unnecessarily constrain yourself to such a simple parent/child communication protocol. And don't be afraid to build your own richer protocol when the task at hand needs it. And in this case, thread::detach() will often make more sense. thread::detach() doesn't necessarily imply a fire-and-forget thread. It can simply mean that your communication protocol is more complex than "I'm done."

Upvotes: 9

Kerrek SB
Kerrek SB

Reputation: 476950

Don't detach, but instead join:

std::vector<std::thread> ts;

for (unsigned int i = 0; i != n; ++i)
    ts.emplace_back(OutputElement, v[i], v[i]);

for (auto & t : threads)
    t.join();

Upvotes: 5

Related Questions