is `memory_order_relaxed` necessary to prevent partial reads of atomic stores

Question

Suppose thread 1 is doing atomic stores on a variable v using memory_order_release (or any other order) and thread 2 is doing atomic reads on v using memory_order_relaxed.

It should be impossible to have partial reads in this case. An example of partial reads would be reading the first half of v from the latest value and the second half of v from the old value.

If thread 2 now reads v without using atomic operations, can we have partial reads in theory?
Can we have partial reads in practice? (Asking because I think this shouldn't matter on most processors, but I'm not sure.)

Peter Cordes · Accepted Answer

For 1. how do you propose doing that?

atomic v is a template that overloads the T() implicit conversion to be like .load(mo_seq_cst). That makes tearing impossible. seq_cst atomic is like relaxed plus some ordering guarantees.

The template also overloads operators like ++ to do an atomic .fetch_add(1, mo_seq_cst). (Or for pre-increment, 1+fetch_add to produce the already-incremented value).

Of course, if you look at the bytes of the object-representation of atomic by reading it with non-atomic char* (e.g. with memcpy(&tmp, &v, sizeof(int)), that's UB if another thread is modifying it. And yes you can get tearing in practice depending on how you do it.

More likely for objects too large to be lock-free, but possible on some implementations e.g. for 8-byte objects on a 32-bit system which can implement 8-byte atomicity with special instructions, but normally will just use two 32-bit loads.

e.g. 32-bit x86 where an atomic 8-byte load can be done with SSE and then bouncing that back to integer regs. Or lock cmpxchg8b. Compilers don't do that when they just want two integer registers.

But many 32-bit RISCs that provide atomic 8-byte loads have a double-register load that produces 2 output registers from one instruction. e.g. ARM ldrd or MIPS ld. Compilers do use these to optimize aligned 8-byte loads even when atomicity isn't the goal, so you'd probably "get lucky" and not see tearing anyway.

Small objects would typically happen to be atomic anyway; see Why is integer assignment on a naturally aligned variable atomic on x86?

Of course the non-atomic access wouldn't assume that the value could change asynchronously, so a loop could use a stale value indefinitely. Unlike a relaxed atomic, which on current compilers is like volatile in that it always re-accesses memory. (Via coherent hardware cache of course, just not keeping the value in a register.)

is `memory_order_relaxed` necessary to prevent partial reads of atomic stores

Answers (1)

Related Questions