Jojolastiti
Jojolastiti

Reputation: 29

Can a compiler invent multiple loads when compiling one std::atomic load using memory_order_relaxed?

Taking the following piece of code

struct SharedMemory
{
  std::atomic<float> value;

  bool shouldReturnTrue()
  {
    float tmp = value.load(std::memory_order_relaxed);
    float a = tmp;
    float b = tmp;
    return a == b;
  }
};

Is it legal for the compiler to optimize it as follow and so break the thread safety ?

struct SharedMemory
{
  std::atomic<float> value;

  bool shouldReturnTrue()
  {
    float a = value.load(std::memory_order_relaxed);
    float b = value.load(std::memory_order_relaxed);
    return a == b;
  }
};

If so, should I force it to not optimize tmp using volatile keyword as follow ?

struct SharedMemory
{
  std::atomic<float> value;

  bool shouldReturnTrue()
  {
    volatile float tmp = value.load(std::memory_order_relaxed);
    float a = tmp;
    float b = tmp;
    return a == b;
  }
};

Upvotes: 1

Views: 63

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 365277

No, that wouldn't be a legal optimization. As you say, it could return false for cases other than tmp being NaN, if there was a writer. Therefore it wouldn't be valid per the as-if rule, because the C++ abstract machine does always return !isnan(tmp).

(If you'd picked a type like int where self == self was always true, it would be easier to talk about!)

The transformation you ask about is only legal for non-atomic variables, precisely because the value can be assumed not to be changing. If it is, that would be data-race undefined behaviour, and compilers are allowed to assume no UB. Thus they can invent another non-atomic non-volatile load if that's more convenient for register allocation.

But an atomic variable does have to be assumed to change value between any two reads (in this way similar to volatile in terms of what compilers have to assume about it), because that is well-defined behaviour that compilers have to respect.

So the compiler will never invent reads of atomic<> objects. (That linked article is about Linux kernel code, where they use volatile for hand-rolled atomics on compilers that are known to work the way they want, for the same reasons we use std::atomic<> in modern C++.)

memory_order_relaxed means that the ordering wrt. surrounding memory accesses is relaxed, not the atomicity guarantees. In this respect, relaxed isn't different from acquire or seq_cst. I guess you had some misconception that relaxed might be relevant here, but it isn't.


You're right that assigning the load result to volatile float tmp would make it less likely for a compiler to want to invent loads, but if that was a real problem (it isn't), a better way would be to use volatile atomic<float> value;. Again, this is not necessary. Especially now, since compilers currently don't optimize atomics at all (Why don't compilers merge redundant std::atomic writes?) but even in future with compilers that only give you the bare minimum ISO C++ standard and do optimize atomics by e.g. coalescing redundant back-to-back reads into one, they won't be allowed to do the other thing and invent extra writes, or invent extra reads and assume the value is the same.

Upvotes: 1

Related Questions