Two threads alternating (ping-pong) execution

Question

Is it possible to make guaranteed alternating execution of two threads without using two atomics (or other stuff like semaphores, etc.)? I want to make sure the both threads execute, preferably if one waited, it should go first when another one has passed current iteration.

Simple std::mutex doesn’t work since one thread could occupy all time, not giving a chance to another thread to take the mutex timely. The same issue happens even with std::atomic_flag::wait.

The desired result could be achieved with two atomic flags like in the demo:

#include 
#include 
#include 

std::atomic_flag ping_lock_flag = ATOMIC_FLAG_INIT;
std::atomic_flag pong_lock_flag = ATOMIC_FLAG_INIT;
std::atomic_int idx = 0;
std::vector v(10);

void update()
{
    ping_lock_flag.wait(false, std::memory_order_relaxed);
    ping_lock_flag.clear(std::memory_order_release);

    std::cout << "update start
";

    std::size_t index = idx.load();
    if (index < v.size()) {
        v[index] = 1;
        idx.fetch_add(1);
    }

    std::cout << "update end
";

    pong_lock_flag.test_and_set(std::memory_order_acquire);
    pong_lock_flag.notify_one();
}

void render()
{
    // Some hard work before locking (prologue)
    // ...

    pong_lock_flag.wait(false, std::memory_order_relaxed);
    pong_lock_flag.clear(std::memory_order_release);

    std::cout << "render start
";

    std::size_t index = idx.load();
    if (index < v.size()) {
        v[index] = 2;
        idx.fetch_add(1);
    }

    std::cout << "render end
";

    ping_lock_flag.test_and_set(std::memory_order_acquire);
    ping_lock_flag.notify_one();

    // Some hard work after locking (epilogue)
    // ...
}

int main()
{
    std::jthread update_thread([&]() { while (idx.load() < v.size()) { update(); } });
    std::jthread render_thread([&]() { while (idx.load() < v.size()) { render(); } });

    ping_lock_flag.test_and_set(std::memory_order_acquire);

    update_thread.join();
    render_thread.join();

    for (auto& value : v) {
        std::cout << value << ", ";
    }

    std::cout << '
';
}

with the output:

update start
update end
render start
render end
update start
update end
render start
render end
update start
update end
render start
render end
update start
update end
render start
render end
update start
update end
render start
render end
update start
update end
1, 2, 1, 2, 1, 2, 1, 2, 1, 2,

The question is if it is possible to get the same result simpler with modern C++ multithreading tools and if not, why?

Reasoning

The purpose it to have operations like update and render in separate threads, but synchronized. Why separate threads in this case? Because synchronization covers only work with internal data and some framework render operations like clearing the device, copying the results to video memory, etc. could be done in parallel with the update. I marked these parts as prologue and epilogue in the render code above. The desire to run this code in parallel with update is the key of the separating work to two threads here. If there is a better way to do so, please advise.

Without ping-pong I have a good change that the code will only run update never running render and vise versa.

With this I have some relaxation the the requirements; namely I am fine if I get some spurious wakeups or miss some wake ups very rarely if it doesn't lead to full stalling one or both threads.

Update

As a second thought I came to the following solution, but I still not sure that it is safe (taking into account all reservations from the Reasoning section).

Here is the demo:

#include 
#include 
#include 

std::atomic_flag ping_lock_flag = ATOMIC_FLAG_INIT;
std::atomic_int idx = 0;
std::vector v(10);

void update()
{
    ping_lock_flag.wait(false, std::memory_order_relaxed);

    std::cout << "update start
";

    std::size_t index = idx.load();
    if (index < v.size()) {
        v[index] = 1;
        idx.fetch_add(1);
    }

    std::cout << "update end
";

    ping_lock_flag.clear(std::memory_order_release);
    ping_lock_flag.notify_one();
}

void render()
{
    // Some hard work before locking (prologue)
    // ...

    ping_lock_flag.wait(true, std::memory_order_relaxed);

    std::cout << "render start
";

    std::size_t index = idx.load();
    if (index < v.size()) {
        v[index] = 2;
        idx.fetch_add(1);
    }

    std::cout << "render end
";

    ping_lock_flag.test_and_set(std::memory_order_acquire);
    ping_lock_flag.notify_one();

    // Some hard work after locking (epilogue)
    // ...
}

int main()
{
    std::jthread update_thread([&]() { while (idx.load() < v.size()) { update(); } });
    std::jthread render_thread([&]() { while (idx.load() < v.size()) { render(); } });

    ping_lock_flag.test_and_set(std::memory_order_acquire);

    update_thread.join();
    render_thread.join();

    for (auto& value : v) {
        std::cout << value << ", ";
    }

    std::cout << '
';
}

It works, but is there some issues I might have missed?

Update 2

I found the issue with the approach suggested above. It enforces "hard ping-pong", render can pass only after update and vice versa. These is no chance to have update-update-render in case update is faster. It seems that I have to rework the requirements. The key is that in case update is N times faster than render I want 'updateto pass approximately N times more often thanrender, but still give a chance to render` when it is ready.

Update 3

This solution should work:

#include 
#include 
#include 
#include 

std::atomic_flag render_update_lock = ATOMIC_FLAG_INIT;
std::atomic_flag render_thread_ready = ATOMIC_FLAG_INIT;
std::atomic_int idx = 0;
std::vector v(10);

using namespace std::chrono_literals;

void update()
{
    if (!render_thread_ready.test()) {
        render_update_lock.wait(true, std::memory_order_relaxed);
        render_update_lock.test_and_set(std::memory_order_acquire);

        std::cout << "update start
";

        std::size_t index = idx.load();
        if (index < v.size()) {
            v[index] = 1;
            idx.fetch_add(1);
        }

        std::cout << "update end
";

        render_update_lock.clear(std::memory_order_release);
        render_update_lock.notify_one();
    }

    std::this_thread::sleep_for(1us);
}

void render()
{
    render_thread_ready.test_and_set(std::memory_order_acquire);

    render_update_lock.wait(true, std::memory_order_relaxed);
    render_update_lock.test_and_set(std::memory_order_acquire);

    render_thread_ready.clear(std::memory_order_release);

    std::cout << "render start
";

    std::size_t index = idx.load();
    if (index < v.size()) {
        v[index] = 2;
        idx.fetch_add(1);
    }

    render_update_lock.clear(std::memory_order_release);
    render_update_lock.notify_one();

    std::cout << "render end
";

    std::this_thread::sleep_for(2us);
}

int main()
{
    std::jthread update_thread([&]() { while (idx.load() < v.size()) { update(); } });
    std::jthread render_thread([&]() { while (idx.load() < v.size()) { render(); } });

    update_thread.join();
    render_thread.join();

    for (auto& value : v) {
        std::cout << value << ", ";
    }

    std::cout << '
';
}

Although I am not checked it hard, so I can miss some potential issues.

Two threads alternating (ping-pong) execution

Answers (1)

Related Questions