Reputation: 141
Even though Memory barriers: a hardware view for software hackers book is considered extremely old (by it's author, seems like Paul himself answered this question) I find it as an excellent helper to build a mental model around memory ordering.
There is a little thing though that I don't understand:
Let's consider the page with a memory barrier:
Step 4 states that "b=1" is written to a store buffer because "a=1" is not written to the cache yet.
The thing that I can't get is why on the next page:
on step 3 "b=1" is written to the cache line, even though there is a memory barrier after "a=1" and "a=1" is not yet written to the cache? Following the previous page reasoning "b=1" should be written to the cache only after (or within) step 10, when a store buffer, containing "a=1" is written to the cache.
Upvotes: 3
Views: 817
Reputation: 51204
The pdf that you posted is different from the screenshot in your question, so I am presuming the old version was incorrect (or at least not precise enough).
Chapter 4.3. actually starts with the following remark:
Let us suppose that CPUs queue invalidation requests, but respond to them immediately. This approach minimizes the cache-invalidation latency seen by CPUs doing stores, but can defeat memory barriers, as seen in the following example.
The sequence is also a bit different than what you posted:
CPU 0 executes a=1
. The corresponding cache line is read-only in CPU 0’s cache, so CPU 0 places the new value of "a" in its store buffer and transmits an "invalidate" message in order to flush the corresponding cache line from CPU 1's cache.
CPU 1 executes while (b==0) continue;
, but the cache line containing "b" is not in its cache. It therefore transmits a "read" message.
CPU 1 receives CPU 0's "invalidate" message, queues it, and immediately responds to it.
CPU 0 receives the response from CPU 1, and is therefore free to proceed past the smp_mb()
on line 4 above, moving the value of "a" from its store buffer to its cache line.
I believe this is a hypothetical scenario, but when you take this into account, the obviously problematic part is CPU 1 acknowledging an "invalidate" message before actually invalidating its cache, which makes CPU 0 think it can proceed.
Upvotes: 1