MaPo
MaPo

Reputation: 855

C++: a question about memory_order_relaxed

I'm comparing C++ Concurrency in Action book by Anthony Williams and this talk.

I found something on the latter that does not fit the explanation of the former and I would like to ask how to come with this apparent inconsistency.

The example is take from the talk here:

\\ in main thread
std::atomic<int> x(0), y(0)

\\ thread 1
int r1 = x.load(std::memory_order_relaxed);
y.store(r1, std::memory_order_relaxed);

\\thread 2
int r2 = y.load(std::memory_order_relaxed);
x.store(42, std::memory_order_relaxed);

Commenting the slide, the speaker says that a situation where I have r1 = r2 = 42 is fine, since, he says, the compiler can freely reorder the instructions of the second threads.

In the book, on page 151 I find

Relaxed operations on different variables can be freely reordered provided they obey any happens-before relationships they’re bound by (for example, within the same thread).

This seems to contradict the statement that (within the same thread) the compiler is free to reorder relaxed memory operations.

Moreover, the author of the book goes on and explain the relaxed memory order with a metaphor:

To understand how this works, imagine that each variable is a man in a cubicle with a notepad. On his notepad is a list of values. You can phone him and ask him to give you a value, or you can tell him to write down a new value. If you tell him to write down a new value, he writes it at the bottom of the list. If you ask him for a value, he reads you a number from the list. The first time you talk to this man, if you ask him for a value, he may give you any value from the list he has on his pad at the time. If you then ask him for another value, he may give you the same one again or a value from farther down the list. He’ll never give you a value from farther up the list. If you tell him to write down a number and then subsequently ask him for a value, he’ll give you either the number you told him to write down or a number below that on the list.

(I think this is simply a metaphor about relaxing cache coherence among threads for atomic variable involved in relaxed memory operations)

However the example of the talk does not fit this logic: at the beginning the two men in the cubicle (x an y) have just x = {0} and y={0} in their notepad. Then, where they are asked to load the value of x and y how can they tell a value which is different from 0?

QUESTIONS

  1. Is reordering operations, allowed within the same thread?
  2. How can I conciliate the book and the talk?

Upvotes: 3

Views: 329

Answers (2)

HolyBlackCat
HolyBlackCat

Reputation: 96791

I believe the "man in the cubicle" metaphor is not relaxed enough.

The way I deal with standardese surrounding atomics is leaving my single-threaded intuition at the door, and asking "what can stop this weird scenario from happening?". If there's nothing preventing it, then it can theoretically happen.

I don't see anything that prevents r1 = r2 = 42 here.


Each atomic has its own modification order ("the notepad" from the metaphor), but they're completely independent from each other unless something synchronizes them (there's no synchronization in your example).

Each modification order has to be consistent with each individual thread, but since in your example each atomic is accessed once per thread, this rule is of no use either.


While thread 2 seems to "reorder" x.store(42, relaxed) before y.load(relaxed), the standard isn't worded in terms of reorderings, so formally it's not correct to say that they got reordered.

On the picture below, Thread 1,2 are "sequenced-before" orders of individual threads, and x,y are modification orders of individual variables.

As you can see, in none of the orders y.load(relaxed) comes after x.store(42, relaxed). The only order that contains both is the thread 2's sequenced-before, and there the load comes first.

pic

Upvotes: 3

Peter Cordes
Peter Cordes

Reputation: 365537

In terms of the cubicle analogy, phone calls asking to read a variable can be put on hold or not answered right away (cache miss load), such that a write from another thread has a chance to get a value added to the list (the modification-order for the variable) before the cubicle guy checks the list and responds to the read request.

(Or if the load address wasn't ready yet, the core running the thread might not make the phone call until later. But this model doesn't account for a store buffer. Speculative execution needs to avoid making changes to shared state that it's not sure are correct yet.)


https://preshing.com/20120710/memory-barriers-are-like-source-control-operations/ has an analogy for most kinds of reordering in terms of requests to a shared server to push or pull on a git repo, where the network pipeline between client and server is like a store buffer, which naturally introduces at least StoreLoad reordering because CPUs want to load early and store late.

But the relevant reordering here is LoadStore:

Unlike #LoadLoad and #StoreStore, there’s no clever metaphor for #LoadStore in terms of source control operations. The best way to understand a #LoadStore barrier is, quite simply, in terms of instruction reordering.

[... see the article for the rest of the explanation...]


"Instruction" reordering isn't always what happens; in-order CPUs can do LoadStore reordering by allowing out of order completion of loads, only stalling if something actually tries to read the register result before it's ready. The critical moment is when a load copies data out of coherent cache (whenever it's ready, at any point after the load executed) or when a store copies data from the store buffer into coherent cache, into a line where this core has obtained exclusive ownership of (MESI Modified or Exclusive) and after the store has retired from OoO exec so is known to be non-speculative. Unlike loads, speculative stores need to be buffered so they don't "infect" other cores with mis-speculated state.

Upvotes: 2

Related Questions