Which types of memory_order should be used for non-blocking behaviour with an atomic_flag?

Question

I'd like, instead of having my threads wait, doing nothing, for other threads to finish using data, to do something else in the meantime (like checking for input, or re-rendering the previous frame in the queue, and then returning to check to see if the other thread is done with its task).

I think this code that I've written does that, and it "seems" to work in the tests I've performed, but I don't really understand how std::memory_order_acquire and std::memory_order_clear work exactly, so I'd like some expert advice on if I'm using those correctly to achieve the behaviour I want.

Also, I've never seen multithreading done this way before, which makes me a bit worried. Are there good reasons not to have a thread do other tasks instead of waiting?

/*test program
intended to test if atomic flags can be used to perform other tasks while shared
data is in use, instead of blocking

each thread enters the flag protected part of the loop 20 times before quitting
if the flag indicates that the if block is already in use, the thread is intended to
execute the code in the else block (only up to 5 times to avoid cluttering the output)

debug note: this doesn't work with std::cout because all the threads are using it at once
and it's not thread safe so it all gets garbled.  at least it didn't crash

real world usage
one thread renders and draws to the screen, while the other checks for input and
provides frameData for the renderer to use.  neither thread should ever block*/

#include 
#include 
#include 
#include 

struct ThreadData {
    int numTimesToWriteToDebugIfBlockFile;
    int numTimesToWriteToDebugElseBlockFile;
};

class SharedData {
public:
    SharedData() {
        threadData = new ThreadData[10];
        for (int a = 0; a < 10; ++a) {
            threadData[a] = { 20, 5 };
        }
        flag.clear();
    }

    ~SharedData() {
        delete[] threadData;
    }

    void runThread(int threadID) {
        while (this->threadData[threadID].numTimesToWriteToDebugIfBlockFile > 0) {
            if (this->flag.test_and_set(std::memory_order_acquire)) {
                std::string fileName = "debugIfBlockOutputThread#";
                fileName += std::to_string(threadID);
                fileName += ".txt";
                std::ofstream writeFile(fileName.c_str(), std::ios::app);
                writeFile << threadID << ", running, output #" << this->threadData[threadID].numTimesToWriteToDebugIfBlockFile << std::endl;
                writeFile.close();
                writeFile.clear();
                this->threadData[threadID].numTimesToWriteToDebugIfBlockFile -= 1;
                this->flag.clear(std::memory_order_release);
            }
            else {
                if (this->threadData[threadID].numTimesToWriteToDebugElseBlockFile > 0) {
                    std::string fileName = "debugElseBlockOutputThread#";
                    fileName += std::to_string(threadID);
                    fileName += ".txt";
                    std::ofstream writeFile(fileName.c_str(), std::ios::app);
                    writeFile << threadID << ", standing by, output #" << this->threadData[threadID].numTimesToWriteToDebugElseBlockFile << std::endl;
                    writeFile.close();
                    writeFile.clear();
                    this->threadData[threadID].numTimesToWriteToDebugElseBlockFile -= 1;
                }
            }
        }
    }
private:
    ThreadData* threadData;
    std::atomic_flag flag;
};

void runThread(int threadID, SharedData* sharedData) {
    sharedData->runThread(threadID);
}

int main() {
    SharedData sharedData;
    std::thread thread[10];
    for (int a = 0; a < 10; ++a) {
        thread[a] = std::thread(runThread, a, &sharedData);
    }
    thread[0].join();
    thread[1].join();
    thread[2].join();
    thread[3].join();
    thread[4].join();
    thread[5].join();
    thread[6].join();
    thread[7].join();
    thread[8].join();
    thread[9].join();
    return 0;
}```

Nate Eldredge · Accepted Answer

The memory ordering you're using here is correct.

The acquire memory order when you test and set your flag (to take your hand-written lock) has the effect, informally speaking, of preventing any memory accesses of the following code from becoming visible before the flag is tested. That's what you want, because you want to ensure that those accesses are effectively not done if the flag was already set. Likewise, the release order on the clear at the end prevents any of the preceding accesses from becoming visible after the clear, which is also what you need so that they only happen while the lock is held.

However, it's probably simpler to just use a std::mutex. If you don't want to wait to take the lock, but instead do something else if you can't, that's what try_lock is for.

class SharedData {
    // ...
private:
    std::mutex my_lock;
}
// ...
if (my_lock.try_lock()) {
    // lock was taken, proceed with critical section
    my_lock.unlock();
} else {
    // lock not taken, do non-critical work
}

This may have a bit more overhead, but avoids the need to think about atomicity and memory ordering. It also gives you the option to easily do a blocking wait if that later becomes useful. If you've designed your program around an atomic_flag and later find a situation where you must wait to take the lock, you may find yourself stuck with either spinning while continually retrying the lock (which is wasteful of CPU cycles), or something like std::this_thread::yield(), which may wait for longer than necessary after the lock is available.

It's true this pattern is somewhat unusual. If there is always non-critical work to be done that doesn't need the lock, commonly you'd design your program to have a separate thread that just does the non-critical work continuously, and then the "critical" thread can just block as it waits for the lock.

Which types of memory_order should be used for non-blocking behaviour with an atomic_flag?

Answers (1)

Related Questions