user997112
user997112

Reputation: 30605

Is the MESI protocol enough, or are memory barriers still required? (Intel CPUs)

I found an intel document which states memory barriers are required when string (not std::string, but assembly string instructions) are used, to prevent them being re-ordered by the CPU.

However, are memory barriers also required when two threads (on two different cores) are accessing the same memory? The scenario I had in mind is where one of the CPUs which doesn't "own" the cache line writes to this memory and the core writes to its store buffer (as opposed to its cache). A memory barrier is required to flush the value from the store buffer to the cache, so the other core can obtain this value?

I am unsure whether, on Intel, the MESI protocol handles this?

(what I have tried to (badly) explain above is better-described in the below paper, pages 6-12):

http://www.puppetmastertrading.com/images/hwViewForSwHackers.pdf

The above paper is very general and I am unsure how Intel CPUs practically handle the problem.

Upvotes: 4

Views: 2161

Answers (2)

Peter Cordes
Peter Cordes

Reputation: 363980

I think you're talking about ERMSB (fast strings) in Intel IvB and later making rep movs use weakly-ordered writes.

My conclusion from Intel's docs is that you still don't need SFENCE to order those stores relative to other stores, and of course you can't run SFENCE in the middle of a rep movsb. See that answer for more stuff in general about memory barriers on x86.

AFAICT, all you need to do is avoid using the same rep movs to write a buffer and the flag that readers will check to see if the buffer is ready. A reader could see the flag before all of the stores to the buffer are visible to it. This is the only way the new ERMSB feature affects correctness, for programs that were already correct (i.e. didn't depend on flukes of timing). It has a positive effect on performance for memcpy / memset.

Upvotes: 1

Leeor
Leeor

Reputation: 19706

MESI protocols apply to caches, store buffering is essentially pre-cache, meaning that it's a store that was not yet "released" to the outside world, and its synchronization point was not yet determined.

You also need to keep in mind that cache coherency only guarantees that writes don't occur on stale copies of a cacheline and get lost along the way. The only guarantee of such protocols is to hide the fact that you have caches with copied values (a performance optimization in itself), and expose to the programmer/OS the illusion of a single level flat physical memory.

That, by itself, gives you no guarantee on the ordering of writes and reads from multiple cores, for that purpose you need to manage your code using additional constructs that the ISA provides, like locks, fences, and relying on memory ordering rules.

The situation you describe is not possible as it breaks the first part - a core that does not own a line can't write to memory since it would miss the updated data in the core that does own the line (if such exists). What would happen under a MESI protocol is that the write will be buffered for a while, and when its turn comes to be issued - it would send a request for ownership that would invalidate all copies of that line in other cores (triggering a writeback if there's a modified copy), and fetch the updated data. Only then the writer core may modify the line and mark it as modified.

However, if 2 cores write to the same line simultaneously, MESI protocol only guarantees that these write will have some order, not a specific one you might want. Worse - if each core write several lines and you want atomicity around these writes, MESI doesn't guarantee that. You'll need to actively add a mutex or a barrier of some sort to force the HW to perform the writes the way you want.

Upvotes: 4

Related Questions