Reputation: 20648
This question is a follow-up question on Can the C compiler optimizer violate short-circuiting and reorder memory accesses for operands in a logical-AND expression?.
Consider the following code.
if (*p && *q) {
/* do something */
}
Now as per the discussion at Can the C compiler optimizer violate short-circuiting and reorder memory accesses for operands in a logical-AND expression? (especially David Schwartz comment and answer) it is possible for the optimizer of a standard-conformant C compiler to emit CPU instructions that accesses *q
before *p
while still maintaining the observable behaviour of the sequence point established with the &&
-operator.
Therefore although the optimizer may emit code that accesses *q
before *p
, it still needs to ensure that any side-effects of *q
(such as segmentation fault) is observable only if *p
is non-zero. If *p
is zero, then a fault due to *q
should not be observable, i.e. a speculative fault would occur first due to *q
being executed first on the CPU but the speculative fault would be ignored away once *p
is executed and found to be 0.
My question: How is this speculative fault implemented under the hood?
I would appreciate if you could throw more light on the following points while answering this question.
Upvotes: 1
Views: 104
Reputation: 52548
In C and C++, you have the "as-if" rule, which means the compiler can do whatever it likes as long as the observable behaviour is what the language promises.
If the compiler generates code for an ancient processor without memory protection, where reading *q will read something (an unspecified value) without any side effects, then clearly it is allowed to read *q, and even exchange the order of the tests. Just as any compiler can swap the operands in (x > 0 || y > 0), provided y has a defined value or reading y with undefined value has no side effect.
But you are asking about speculative execution in the processor. Well, processors do execute instructions after conditional branches before they know whether the conditional branch was taken or not, but they make 100% sure that this doesn't lead to any visible side effects. There is never any code for this, it is all within the CPU. If conditional execution does something that should generate a trap, then the CPU waits until it knows for sure whether the branch was taken or not, and then it either takes the trap or it doesn't. Your code doesn't see it, and even the OS doesn't see it.
Upvotes: 2
Reputation: 182779
It is implemented as part of the normal speculative fetching process. The result of a speculative fetch, whether it's a numerical result or a fault, is speculative. It is used if, and only if, it is later needed.
As far as I know, when the CPU detects a fault, it generates a trap, that the kernel must handle (either take recovery action such as page swap, or signal the fault such as SIGSEGV to the process). Am I correct?
The result of executing non-speculatively a fetch that produces a fault is a trap. The result of executing a fetch the produces a fault speculatively is a speculative trap that will actually occur only if the result of the speculative fetch is used. If you think about it, speculative fetches would be impossible without this mechanism.
So if the compiler must emit code to perform speculative fault, it appears to me that the kernel and the compiler (and possibly the CPU too) must all cooperate with each other to implement speculative fault. How does the compiler emit instructions that would tell the kernel or the CPU that a fault generated due to the code should be considered speculative?
The compiler does it by placing the fetch for *q
after a test on the result of *p
. That signals the CPU that the fetch is speculative and that it can only use the results once the result of the test on the result of *p
is known.
The CPU can, and does, perform the fetch of *q
before it knows whether it needs it or not. This is nearly essential because a fetch can require inter-core operations which are slow -- you wouldn't want to wait any longer than needed. So modern multi-core CPUs implement aggressive speculative fetching.
This is what modern CPUs do. (The answer for CPUs with explicit speculative fetch operations is different.)
Upvotes: 3