Confusing "Memory Barrier Example 1" in 《Memory Barriers: a Hardware View for Software Hackers》？

Question

I am reading the great paper 《Memory Barriers: a Hardware View for Software Hackers》 written by Paul E. McKenney, which helps me a lot. But I came across a doubt in 《6.2 Example 1》: Table 2: Memory Barrier Example 1

The author has given an explanation to the assertion firing as:

Suppose CPU 0 recently experienced many cache misses, so that its message queue is full, but that CPU 1 has been running exclusively within the cache, so that its message queue is empty. Then CPU 0’s assignment to “a” and “b” will appear in Node 0’s cache immediately (and thus be visible to CPU 1), but will be blocked behind CPU 0’s prior traffic. In contrast, CPU 1’s assignment to “c” will sail through CPU 1’s previously empty queue. Therefore, CPU 2 might well see CPU 1’s assignment to “c” before it sees CPU 0’s assignment to “a”, causing the assertion to fire, despite the memory barriers.

To my understanding, the CPU0's assignment to "b" will NOT appear in Node 0's cache until smp_wmb() is done in CPU0, and the smp_wmb() will block until all CPUs including CPU2 response. However, the CPU0 Message Queue is already full, so the smp_wmb() is NOT going to finish instantly as well as the CPU0's assignment to "b". This seems conflicting with the author's explanation.

In my humble opinion, I don't think this assertion in CPU2 would fire. To support it, we only need to ensure x == 1 while z == 1. Here is my reasoning:

If z == 1(z=c;), that means "c" has been assigned to 1 by CPU1;
Then that also means "b" has been assigned to 1 by CPU0;
As I stated ahead, the assignment of "a" and smp_wmb() in CPU0 must have both finished already.

Until now, we can conclude as: For CPU2, if z == 1, the assignment of "a" in CPU0 has been perceived(at least in the invalid queue of CPU2).

Then after the smp_rmb() of CPU2, x=a; will get the new value 1. That's to say, for CPU2's assertion, if z == 1 then x == 1, so it never fires.

That's my understanding, I'm kind of confused now. Could anyone help me figure it out?

As supplements, the cache system is NUCA(non-uniform cache architecture) , illustrated as: Figure 8: Example Ordering-Hostile Architecture

Perhaps the major confusion can be abstracted to one question: For CPU0, is it possible to execute b=1; if the "CPU0 Message Queue" jams?

Confusing "Memory Barrier Example 1" in 《Memory Barriers: a Hardware View for Software Hackers》？

Answers (0)

Related Questions

Confusing &quot;Memory Barrier Example 1&quot; in 《Memory Barriers: a Hardware View for Software Hackers》？

Answers (0)

Related Questions

Confusing "Memory Barrier Example 1" in 《Memory Barriers: a Hardware View for Software Hackers》？