Using memory barriers to force in-order execution

Question

Trying to go on with my idea that using both software and hardware memory barriers I could disable the out-of-order optimization for a specific function inside a code that is compiled with compiler optimization, and therefore I could implement software semaphore using algorithms like Peterson or Deker that requires no out-of-order execution, I have tested the following code that contains both SW barrier asm volatile("": : :"memory") and gcc builtin HW barrier __sync_synchronize:

#include 
int main(int argc, char ** argv)
{
    int x=0;
    asm volatile("": : :"memory");
    __sync_synchronize();
    x=1;
    asm volatile("": : :"memory");
    __sync_synchronize();
    x=2;
    asm volatile("": : :"memory");
    __sync_synchronize();
    x=3;
    printf("%d",x);
    return 0;
}

But the compilation output file is:

main:
.LFB24:
    .cfi_startproc
    subq    $8, %rsp
    .cfi_def_cfa_offset 16
    mfence
    mfence
    movl    $3, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    mfence
    call    __printf_chk
    xorl    %eax, %eax
    addq    $8, %rsp

And if I remove the barriers and compile again, I get:

main
.LFB24:
    .cfi_startproc
    subq    $8, %rsp
    .cfi_def_cfa_offset 16
    movl    $3, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
    xorl    %eax, %eax
    addq    $8, %rsp

both compiled with gcc -Wall -O2 in Ubuntu 14.04.1 LTS, x86.

The expected result was that the output file of the code that contains the memory barriers will contain all the assignments of the values I have in my source code, with mfence between them.

According to a related StackOverflow post -

gcc memory barrier __sync_synchronize vs asm volatile("": : :"memory")

When adding your inline assembly on each iteration, gcc is not permitted to change the order of the operations past the barrier

And later on:

However, when the CPU performes this code, it's permitted to reorder the operations "under the hood", as long as it does not break memory ordering model. This means that performing the operations can be done out of order (if the CPU supports that, as most do these days). A HW fence would have prevented that.

But as you can see, the only difference between the code with the memory barriers and the code without them is that the former one contains mfence in a way I was not expected to see it, and not all the assignments are included.

Why is the output file of the file with the memory barriers was not as I expected- Why does the mfence order has been altered? Why did the compiler remove some of the assignments? Is the compiler allowed to make such optimizations even if the memory barrier is applied and separates every single line of code?

References to the memory barrier types and usage:

Memory Barriers - http://bruceblinn.com/linuxinfo/MemoryBarriers.html
GCC Builtins - https://gcc.gnu.org/onlinedocs/gcc-4.4.3/gcc/Atomic-Builtins.html

a3f · Accepted Answer

The memory barriers tell the compiler/CPU that instruction shouldn't be reordered across the barrier, they don't mean that writes that can be proven pointless have to be done anyway.

If you define your x as volatile, the compiler can't make the assumption, that it's the only entity that cares about xs value and has to follow the rules of the C abstract machine, which is for the memory write to actually happen.

In your specific case you could then skip the barriers, because it's already guaranteed that volatile accesses aren't reordered against each other.

If you have C11 support, you are better off using _Atomics, which additionally can guarantee that normal assignments won't be reordered against your x and that the accesses are atomic.

EDIT: GCC (as well as clang) seem to be inconsistent in this regard and won't always do this optimizaton. I opened a GCC bug report regarding this.

Using memory barriers to force in-order execution

Answers (1)

Related Questions