xmllmx
xmllmx

Reputation: 42235

How does memory barrier work?

Under Windows, there are three compiler-intrinsic functions to implement memory barrier:

1. _ReadBarrier;

2. _WriteBarrier;

3. _ReadWriteBarrier;

However, I found a weird problem: _ReadBarrier seems a dummy function doing nothing! The following is my assembly code generated by VC++ 2012.

My question is: How to implement a memory barrier function in assembly instructions?

int main()
{   
013EEE10  push        ebp  
013EEE11  mov         ebp,esp  
013EEE13  sub         esp,0CCh  
013EEE19  push        ebx  
013EEE1A  push        esi  
013EEE1B  push        edi  
013EEE1C  lea         edi,[ebp-0CCh]  
013EEE22  mov         ecx,33h  
013EEE27  mov         eax,0CCCCCCCCh  
013EEE2C  rep stos    dword ptr es:[edi]  
    int n = 0;
013EEE2E  mov         dword ptr [n],0  
    n = n + 1;
013EEE35  mov         eax,dword ptr [n]  
013EEE38  add         eax,1  
013EEE3B  mov         dword ptr [n],eax  
    _ReadBarrier();
    n = n + 1;
013EEE3E  mov         eax,dword ptr [n]  
013EEE41  add         eax,1  
013EEE44  mov         dword ptr [n],eax 
}
013EEE56  xor         eax,eax  
013EEE58  pop         edi  
013EEE59  pop         esi  
013EEE5A  pop         ebx  
013EEE5B  add         esp,0CCh  
013EEE61  cmp         ebp,esp  
013EEE63  call        __RTC_CheckEsp (013EC3B0h)  
013EEE68  mov         esp,ebp  
013EEE6A  pop         ebp  
013EEE6B  ret 

Upvotes: 4

Views: 2678

Answers (3)

Mats Petersson
Mats Petersson

Reputation: 129314

Modern processors are capable of executing instructions quite a long way ahead of where it actually is "completing" the instructions, so memory barriers are used to prevent it from running to far ahead when it comes to certain types of memory operations, where strict ordering is required - for most things, it doesn't actually matter if you write to variable a before variable b, or b before a. But sometimes it does.

The x86 instruction set has lfence, sfence and fence, which are instructions that "fence in" loads, stores and all memory operations respectively. The point about a "fence" or "barrier" instruction is to ensure that all the instructions that precede the barrier instruction has completed their loads, stores or both before the next instruction after the barrier can continue.

This is important if you are implementing for example semaphores, mutexes or similar instructions, since it's important to store the value saying "I've locked the semaphore" before you continue to read other data, for example. Otherwise things can go wrong, let's say.

Note that unless you REALLY know what you are doing with memory barriers, it's probably best to NOT use them - and rely on already existing code that solves the same problem - std::atomic are one place to fund such code. I have written quite a bit of "tricky" code, but only once or twice have I needed a memory barrier in my code.

Several times, I've needed to make the compiler not spread the code around, which you can do with "no-op functions", and apparently there are even special intrinsic functions these days to do that.

Upvotes: 6

MSN
MSN

Reputation: 54554

_ReadBarrier, _WriteBarrier, and _ReadWriteBarrier are intrinsics that affect how the compiler can reorder code; they have absolutely nothing to do with CPU memory barriers and are only valid for specific kinds of memory (see "Affected Memory" here).

MemoryBarrier() is the intrinsic that you use to force a CPU memory barrier. However, the recommendation from Microsoft is to use std::atomic<T> going forward with VC++.

Upvotes: 10

James Kanze
James Kanze

Reputation: 153899

There are several important points to consider. Perhaps the first is that barriers only have an effect in multithreaded code, and most compilers require a special option to produce multithreaded code. And things like _ReadBarrier are almost certainly compiler built-ins, and should do nothing unless you've given the options for multithreaded code.

The second is that what the hardware requires, even in a multithreaded context, varies. On most of the machines I've worked on (over some forty years), the machine never required anything; barriers only become relevant if the machine has sophisticated read and write pipelines. (Most earlier machines didn't even have fence or barrier instructions, so the generated code would have to be empty.)

Upvotes: 0

Related Questions