Artem Konovalenkov
Artem Konovalenkov

Reputation: 141

Memory barriers: A hardware view for software hackers - invalidate queues

Even though Memory barriers: a hardware view for software hackers book is considered extremely old (by it's author, seems like Paul himself answered this question) I find it as an excellent helper to build a mental model around memory ordering.

There is a little thing though that I don't understand:

Let's consider the page with a memory barrier:

memory barrier page 1

Step 4 states that "b=1" is written to a store buffer because "a=1" is not written to the cache yet.

The thing that I can't get is why on the next page:

enter image description here

on step 3 "b=1" is written to the cache line, even though there is a memory barrier after "a=1" and "a=1" is not yet written to the cache? Following the previous page reasoning "b=1" should be written to the cache only after (or within) step 10, when a store buffer, containing "a=1" is written to the cache.

Upvotes: 3

Views: 817

Answers (1)

vgru
vgru

Reputation: 51204

The pdf that you posted is different from the screenshot in your question, so I am presuming the old version was incorrect (or at least not precise enough).

Chapter 4.3. actually starts with the following remark:

Let us suppose that CPUs queue invalidation requests, but respond to them immediately. This approach minimizes the cache-invalidation latency seen by CPUs doing stores, but can defeat memory barriers, as seen in the following example.

The sequence is also a bit different than what you posted:

  1. CPU 0 executes a=1. The corresponding cache line is read-only in CPU 0’s cache, so CPU 0 places the new value of "a" in its store buffer and transmits an "invalidate" message in order to flush the corresponding cache line from CPU 1's cache.

  2. CPU 1 executes while (b==0) continue;, but the cache line containing "b" is not in its cache. It therefore transmits a "read" message.

  3. CPU 1 receives CPU 0's "invalidate" message, queues it, and immediately responds to it.

  4. CPU 0 receives the response from CPU 1, and is therefore free to proceed past the smp_mb() on line 4 above, moving the value of "a" from its store buffer to its cache line.

I believe this is a hypothetical scenario, but when you take this into account, the obviously problematic part is CPU 1 acknowledging an "invalidate" message before actually invalidating its cache, which makes CPU 0 think it can proceed.

Upvotes: 1

Related Questions