Davy Landman
Davy Landman

Reputation: 15438

Do you need to pair memory_order_acquire and memory_order_release around a block of code?

I'm making a profiler for an interpreter, I need the interpreter to write the current frame position somewhere on every call. Then sample that information every X ms. I initially started with rigtorp's spinlock around the frame position, but that had quite an effect on the runtime of the interpreter (profiling pointed at the locking acquire time for every loop through the interpreter). So after reading quite some pages about memory fences I came up with a more efficient solution, but I would like to know if this is the correct interpretation of the relation between memory_order_relaxed & acquire/release.

#include <memory>
#include <chrono>
#include <string>
#include <iostream>
#include <thread>
#include <immintrin.h>
#include <cstring>
#include <atomic>

using namespace std;
 
typedef struct frame {
    uint8_t op;
    uint16_t arg;
    uint32_t check;
} frame;


static constexpr unsigned int FRAME_BUFFER_SIZE = 8 * 1024;

static atomic<unsigned int> index;
static frame frames[FRAME_BUFFER_SIZE];

static void writer() {
    uint8_t op = 0;
    uint16_t arg = 0;
    for (;;) {
        op++;
        arg++;
        const auto newIndex = index.load(memory_order_relaxed) + 1;
        auto &target = frames[newIndex % FRAME_BUFFER_SIZE];
        target.op = op;
        target.arg = arg;
        target.check = static_cast<uint32_t>(arg) + op;
        index.store(newIndex, memory_order_release);
        _mm_pause(); // give the other threads some time
    }
}

static void reader() {
    for (;;) {
        const auto lastValidIndex = index.load(memory_order_acquire);
        // we race, hoping that the FRAME_BUFFER_SIZE is enough room 
        // to avoid writter catching up to us
        const auto snapshot = frames[lastValidIndex % FRAME_BUFFER_SIZE];
        if ((static_cast<uint32_t>(snapshot.arg) + snapshot.op) != snapshot.check) {
            cout << "Invalid snapshot\n";
            exit(1);
        }
        // we sleep a bit, since the reader is only intendede to read once in a while
        this_thread::sleep_for(chrono::milliseconds(1)); 

    }
}

int main() {
    cout << "Starting race\n";
    index = 0;
    memset(frames, 0, sizeof(frames));
    thread w(writer);
    thread r(reader);
    w.join();
    r.join();
    return 0;
}

So the strategy is as follows, we have a circular buffer, where the writer is writing into, it's the only one that mutates the index variable, so the first read is memory_order_relaxed. Then we update the value in the array, and then we store the new "full" index, this time with a memory_order_release. The reader only reads the index (this case with a memory_order_acquire) and then indexes of that location in the array.

So my questions are:

Upvotes: 0

Views: 104

Answers (0)

Related Questions