Alex
Alex

Reputation: 13116

Is sufficient to use std::memory_order_acq_rel with one atomic var for add/sub/inc/dec?

As known, it is sufficient to use Release-Acquire ordering (std::memory_order_acq_rel) when we use only one atomic variable to store or load it: https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

But, is this true for other elementary wait-free functions such as: addition, subtraction, increment, and decrement?

i.e. Is the next() function thread-safe in the following C++ code for both weak (arm-cpu, ...) and strong (x86-cpu, ...) memory models, or does it need another barrier ordering (lower / higher)?

#include <iostream>
#include <atomic>
using namespace std;

class progression_lf {
 public:
 progression_lf() : n(0) {}

 int next() {
    // memory_order_acq_rel - enough, and increases performance for the weak memory models: arm, ...
    int const current_n = n.fetch_add(1, std::memory_order_acq_rel);
    int result = 2 + (current_n - 1)*3;
    return result;
 }

 bool is_lock_free() { return ATOMIC_INT_LOCK_FREE; }

 private:
 std::atomic<int> n;
};

int main() {

    // reference (single thread)
    for(int n = 0; n < 10; ++n) {
        std::cout << (2+(n-1)*3) << ", ";
    }
    std::cout << std::endl;

    // wait-free (multi-thread safety)
    progression_lf p;
    for(int n = 0; n < 10; ++n) {
        std::cout << (p.next()) << ", ";
    }
    std::cout << std::endl; 

    std::cout << "lock-free & wait-free: " << 
        std::boolalpha << p.is_lock_free() << 
        std::endl;

    return 0;
}

Upvotes: 2

Views: 523

Answers (1)

Anton
Anton

Reputation: 6537

I'm afraid that you don't need any C++ memory ordering stronger than relaxed here if your threads need no more than just a unique number. Atomicity is enough and std::memory_order_relaxed guarantees that:

Relaxed operation: there are no synchronization or ordering constraints, only atomicity is required of this operation.

Though in fact, a code with an atomic read-modify-write operation will still generate the hardware instruction on x86 which implies the full memory barrier.

You can see what different compilers generate for different platforms here.

Upvotes: 2

Related Questions