Reputation: 1570
I have been studying the memory order semantics in C++ 11 and having some difficulty in understanding how memory_order_acquire works in a CPU level.
According to the cppreference;
A load operation with this memory order performs the acquire operation on the affected memory location: no reads or writes in the current thread can be reordered before this load. All writes in other threads that release the same atomic variable are visible in the current thread (see Release-Acquire ordering below)
The part I really can't understand is;
no reads or writes in the current thread can be reordered before this load.
What happens if the CPU has already reordered commands before even reaching 'memory_order_acquire' part? Does the CPU reverts all the work has done? How does this can be guaranteed?
Thank you.
Upvotes: 0
Views: 199
Reputation: 180155
CPU's don't "reach" the memory_order_acquire
part. Those are instructions for the compiler. The compiler has to translate that, using its knowledge of the CPU memory model.
For instance, if a CPU will only reorder over a maximum of 2 instructions, inserting 2 NOP instructions would be a rather trivial way to achieve that part of the semantics.
Upvotes: 1
Reputation: 3557
As noted in the second paragraph here
The instructions of the program may not be run in the correct order, as long as the end result is correct.
OoOE doesn't just blindly execute anything that's available. The CPU will contain logic that expressly prohibits reordering those accesses across the boundary. As noted elsewhere in that article, the silicon cost of OoOE is quite expensive, quite likely due to issues of this sort.
As noted in this SO question memory barriers do come with a cost - that makes a lot of sense in the light of the above. Basically they do cause the normal OoOE pipeline to take a hit.
Upvotes: 0