Reputation: 3930
Trying to go on with my idea that using both software and hardware memory barriers I could disable the out-of-order optimization for a specific function inside a code that is compiled with compiler optimization, and therefore I could implement software semaphore using algorithms like Peterson
or Deker
that requires no out-of-order execution, I have tested the following code that contains both SW barrier asm volatile("": : :"memory")
and gcc builtin HW barrier __sync_synchronize
:
#include <stdio.h>
int main(int argc, char ** argv)
{
int x=0;
asm volatile("": : :"memory");
__sync_synchronize();
x=1;
asm volatile("": : :"memory");
__sync_synchronize();
x=2;
asm volatile("": : :"memory");
__sync_synchronize();
x=3;
printf("%d",x);
return 0;
}
But the compilation output file is:
main:
.LFB24:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
mfence
mfence
movl $3, %edx
movl $.LC0, %esi
movl $1, %edi
xorl %eax, %eax
mfence
call __printf_chk
xorl %eax, %eax
addq $8, %rsp
And if I remove the barriers and compile again, I get:
main
.LFB24:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $3, %edx
movl $.LC0, %esi
movl $1, %edi
xorl %eax, %eax
call __printf_chk
xorl %eax, %eax
addq $8, %rsp
both compiled with gcc -Wall -O2
in Ubuntu 14.04.1 LTS, x86.
The expected result was that the output file of the code that contains the memory barriers will contain all the assignments of the values I have in my source code, with mfence
between them.
According to a related StackOverflow post -
gcc memory barrier __sync_synchronize vs asm volatile("": : :"memory")
When adding your inline assembly on each iteration, gcc is not permitted to change the order of the operations past the barrier
And later on:
However, when the CPU performes this code, it's permitted to reorder the operations "under the hood", as long as it does not break memory ordering model. This means that performing the operations can be done out of order (if the CPU supports that, as most do these days). A HW fence would have prevented that.
But as you can see, the only difference between the code with the memory barriers and the code without them is that the former one contains mfence
in a way I was not expected to see it, and not all the assignments are included.
Why is the output file of the file with the memory barriers was not as I expected- Why does the mfence
order has been altered? Why did the compiler remove some of the assignments? Is the compiler allowed to make such optimizations even if the memory barrier is applied and separates every single line of code?
References to the memory barrier types and usage:
Memory Barriers - http://bruceblinn.com/linuxinfo/MemoryBarriers.html
GCC Builtins - https://gcc.gnu.org/onlinedocs/gcc-4.4.3/gcc/Atomic-Builtins.html
Upvotes: 6
Views: 5065
Reputation: 8657
The memory barriers tell the compiler/CPU that instruction shouldn't be reordered across the barrier, they don't mean that writes that can be proven pointless have to be done anyway.
If you define your x
as volatile
, the compiler can't make the assumption, that it's the only entity that cares about x
s value and has to follow the rules of the C abstract machine, which is for the memory write to actually happen.
In your specific case you could then skip the barriers, because it's already guaranteed that volatile accesses aren't reordered against each other.
If you have C11 support, you are better off using _Atomic
s, which additionally can guarantee that normal assignments won't be reordered against your x
and that the accesses are atomic.
EDIT: GCC (as well as clang) seem to be inconsistent in this regard and won't always do this optimizaton. I opened a GCC bug report regarding this.
Upvotes: 4