Reputation: 42235
Under Windows, there are three compiler-intrinsic functions to implement memory barrier:
1. _ReadBarrier;
2. _WriteBarrier;
3. _ReadWriteBarrier;
However, I found a weird problem: _ReadBarrier seems a dummy function doing nothing! The following is my assembly code generated by VC++ 2012.
My question is: How to implement a memory barrier function in assembly instructions?
int main()
{
013EEE10 push ebp
013EEE11 mov ebp,esp
013EEE13 sub esp,0CCh
013EEE19 push ebx
013EEE1A push esi
013EEE1B push edi
013EEE1C lea edi,[ebp-0CCh]
013EEE22 mov ecx,33h
013EEE27 mov eax,0CCCCCCCCh
013EEE2C rep stos dword ptr es:[edi]
int n = 0;
013EEE2E mov dword ptr [n],0
n = n + 1;
013EEE35 mov eax,dword ptr [n]
013EEE38 add eax,1
013EEE3B mov dword ptr [n],eax
_ReadBarrier();
n = n + 1;
013EEE3E mov eax,dword ptr [n]
013EEE41 add eax,1
013EEE44 mov dword ptr [n],eax
}
013EEE56 xor eax,eax
013EEE58 pop edi
013EEE59 pop esi
013EEE5A pop ebx
013EEE5B add esp,0CCh
013EEE61 cmp ebp,esp
013EEE63 call __RTC_CheckEsp (013EC3B0h)
013EEE68 mov esp,ebp
013EEE6A pop ebp
013EEE6B ret
Upvotes: 4
Views: 2678
Reputation: 129314
Modern processors are capable of executing instructions quite a long way ahead of where it actually is "completing" the instructions, so memory barriers are used to prevent it from running to far ahead when it comes to certain types of memory operations, where strict ordering is required - for most things, it doesn't actually matter if you write to variable a before variable b, or b before a. But sometimes it does.
The x86 instruction set has lfence
, sfence
and fence
, which are instructions that "fence in" loads, stores and all memory operations respectively. The point about a "fence" or "barrier" instruction is to ensure that all the instructions that precede the barrier instruction has completed their loads, stores or both before the next instruction after the barrier can continue.
This is important if you are implementing for example semaphores, mutexes or similar instructions, since it's important to store the value saying "I've locked the semaphore" before you continue to read other data, for example. Otherwise things can go wrong, let's say.
Note that unless you REALLY know what you are doing with memory barriers, it's probably best to NOT use them - and rely on already existing code that solves the same problem - std::atomic
are one place to fund such code. I have written quite a bit of "tricky" code, but only once or twice have I needed a memory barrier in my code.
Several times, I've needed to make the compiler not spread the code around, which you can do with "no-op functions", and apparently there are even special intrinsic functions these days to do that.
Upvotes: 6
Reputation: 54554
_ReadBarrier
, _WriteBarrier
, and _ReadWriteBarrier
are intrinsics that affect how the compiler can reorder code; they have absolutely nothing to do with CPU memory barriers and are only valid for specific kinds of memory (see "Affected Memory" here).
MemoryBarrier()
is the intrinsic that you use to force a CPU memory barrier. However, the recommendation from Microsoft is to use std::atomic<T>
going forward with VC++.
Upvotes: 10
Reputation: 153899
There are several important points to consider. Perhaps the
first is that barriers only have an effect in multithreaded
code, and most compilers require a special option to produce
multithreaded code. And things like _ReadBarrier
are almost
certainly compiler built-ins, and should do nothing unless
you've given the options for multithreaded code.
The second is that what the hardware requires, even in a multithreaded context, varies. On most of the machines I've worked on (over some forty years), the machine never required anything; barriers only become relevant if the machine has sophisticated read and write pipelines. (Most earlier machines didn't even have fence or barrier instructions, so the generated code would have to be empty.)
Upvotes: 0