Reputation: 151
Suppose I have pseudo C code like below:
int x = 0;
int y = 0;
int __attribute__ ((noinline)) func1(void)
{
int prev = x; (1)
x |= FLAG; (2)
return prev; (3)
}
int main(void)
{
int tmp;
...
y = 5; (4)
compiler_mem_barrier();
func1();
compiler_mem_barrier();
tmp = y; (5)
...
}
Suppose this is a single threaded process so we don't need to worry about locks. And suppose the code is running on an x86 system. Let's also suppose the compiler doesn't do any reordering.
I understand that x86 systems can only reorder write/read instructions (Reads may be reordered with older writes to different locations but not with older writes to the same location). But it's not clear to me if call/ret instructions are considered to be WRITE/READ instructions. So here are my questions:
On x86 systems, is "call" treated as a WRITE instruction? I assume so since call will push the address to the stack. But I didn't find an official document officially saying that. So please help confirm.
For the same reason, is "ret" treated as a READ instruction (since it pops the address from the stack)?
Actually, can "ret" instruction be reordered within the function. For example, can (3) be executed before (2) in the ASM code below? This doesn't make sense to me, but "ret" is not a serializing instruction. I didn't find any place in Intel Manual saying "ret" cannot be reordered.
In the code above, can (1) be executed before (4)? Presumably, read instructions (1) can be reordered ahead of write instructions (4). The "call" instruction may have a "jmp" part, but with speculative execution .... So I feel it can happen, but I hope someone more familiar with this issue can confirm this.
In the code above, can (5) be executed before (2)? If "ret" is considered to be a READ instruction, then I assume it cannot happen. But again, I hope someone can confirm this.
In case the assembly code for func1() is needed, it should be something like:
mov %gs:0x24,%eax (1)
orl $0x8,%gs:0x24 (2)
retq (3)
Upvotes: 6
Views: 1292
Reputation: 364220
Out-of-order execution can reorder anything, but it preserves the illusion that your code executed in program order. The cardinal rule of OoOE is that you don't break single-threaded programs. The hardware tracks dependencies so the instructions can execute as soon as their inputs and an execution unit are ready, but preserves the illusion that everything happened in program order.
You appear to be confusing OoOE on a single core with the order in which the loads/stores become globally visible to other cores. (The store buffer decouples those)
If you have one thread observing the stack memory of another thread running on another core, then yes, the store generated by call
(pushing a return address) will be ordered with other stores.
However, out-of-order execution in the thread running this code can actually execute call
and ret
instructions while a store is delayed on a cache miss, or while a long dependency chain is executing. Multiple cache misses can be in flight at once. The memory-order buffer just has to make sure that later stores don't actually become globally visible until after earlier stores, to preserve x86's memory ordering semantics.
If you have a specific question about hardware reordering, you should probably post asm code, not C code, because C++ compilers can reorder at compile time based on the C++ memory model, which doesn't change when compiling for a strongly-ordered target like x86.
See also How does memory reordering help processors and compilers? (a Java question, but my answer isn't Java-specific).
re: your edit
This answer was already assuming your function was noinline
, and that you were talking about ASM that looked like your C, not what a compiler would actually generate from your code.
mov %gs:0x24,%eax (1)
orl $0x8,%gs:0x24 (2)
retq (3)
So x
is actually in thread-local storage, not a plain global int x
. This doesn't actually matter for out-of-order execution, though; a load with a %gs
segment override is still a load.
Upvotes: 7