Reputation: 81
In the following loop
int i = 0;
// pass address of i to some other part of the program
// Do some work
for (; i < 10;)
{
// Do some more work
function_that_increments_i();
}
if for some (weird, theoretical) reason the compiler optimizes away memory read of i
in i < 10
and always reads the first value loaded, would this behaviour be standards compliant? I would like to know which part of the C++ language delineates the behaviour in such scenarios.
EDIT: After discussing with @Rakete1111, I would like the expert elucidation on the following language and its implications:
The least requirements on a conforming implementation are:
(7.1) — Accesses through volatile glvalues are evaluated strictly according to the rules of the abstract machine.
(7.2) — At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.
(7.3) — The input and output dynamics of interactive devices shall take place in such a fashion that prompting output is actually delivered before a program waits for input. What constitutes an interactive device is implementation-defined.
These collectively are referred to as the observable behavior of the program. [ Note: More stringent correspondences between abstract and actual semantics may be defined by each implementation. —end note ]
The way I interpret it, this allows for reads of i
to be optimized. What am I missing here?
Upvotes: 1
Views: 124
Reputation: 241671
The standard essentially requires the execution of a program to do "the same thing" as a step by step execution of the abstract machine would do. The execution of the abstract machine for each operation is defined by the various semantic provisions of the standard, and it is important to understand that the abstract machine has no optimizations, takes no short cuts, and diligently obeys every operation in the program as written. Noting in the semantics of the abstract machine permits it to not do something which the program as written does (read the value of a variable, for example).
Optimisations (or, in general, program modifications) are allowed by the "as-if" rule if the modification cannot change the observable behaviour of a well-formed program with no Undefined Behaviour. For example, if a value is computed but then never used in a way which might be visible (by being written out, for example), then the computation can be eliminated. But to do that optimisation, the compiler must be able to prove that the value will never be used. It can't just guess that it might not be used.
In essence, it is almost impossible to eliminate a call to a non-inlined function, such as the one in your question. The signature of the function does not indicate whether or not the function has some observable behaviour (such as writing to stdout), so it must be called.
Similarly, if a variable is not declared volatile
, a read from the variable can only be eliminated if the compiler can prove that the value will not be used or that the value can be deduced without reading the variable. One possibility is that the compiler can prove that a well-formed program without Undefined Behaviour cannot modify the value of the variable, allowing the use of a previously read value. (If the variable is declared const
, this will be easier to prove.)
In the case of a non-const variable whose address is taken, it will be very difficult for the compiler to prove that the value is not changed, and if the execution includes the invocation of an external function whose definition is not visible, it will be impossible to prove. In the absence of such a proof, the compiler is obliged to conform with the behaviour of the abstract machine.
Upvotes: 1
Reputation: 19021
Ok, let's go with the standard if you like and proove that a comformant compiler vendor will properly evaluate i
on every iteration. First let's see that is the purpose of the standard:
The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine.
Next let's study how this abstract machine wants your program to behave. Let's see in it's definition for for
-loop statement:
The for statement
for ( for-init-statement conditionopt; expressionopt) statement
is equivalent to
{ for-init-statement while ( condition ) { statement expression ; } }
That sends us to while
statement:
A while statement of the form
while (T t = x) statement
is equivalent to
label: { // start of condition scope T t = x; if (t) { statement goto label; } }
So i < 10
is your condition.
Next we need to show that condition is while
operator is a full-expression. For this we have next excerpt from the Standard:
A full-expression is an expression that is not a subexpression of another expression. ... [ Example:
struct S { S(int i): I(i) { } int& v() { return I; } private: int I; }; S s1(1); // full-expression is call of S::S(int) S s2 = 2; // full-expression is call of S::S(int) void f() { if (S(3).v()) // full-expression includes lvalue-to-rvalue and // int to bool conversions, performed before // temporary is deleted at end of full-expression { } }
—end example ]
Here we are interseted only in the example which tells us that dondition part of conditional operator if
is a full-expression
.
Next we would like to know how full-ex[ressions are sequenced:
Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.
Cool, so we see that that all side effects which can come from function_that_increments_i()
must be evaluated before condition got evaluated next time.
Well, but why vendor of a comformant compiler must care about how abstract machine wants your program to behave? For this we have another excerpt from the standard:
This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.
This concludes the proof.
I also recommend you to read about as-if rule.
Upvotes: 3
Reputation: 6131
The C++ standard describes an abstract virtual machine that runs programs in the manner it describes, if the source code is valid according to its rules. An actual C++ compiler is code implementing an environment that matches, as closely as possible, the virtual machine described in the standard. While the standard makes few mentions of specific optimizations, it does require this of compiler implementations:
4.6 paragraph 5:
A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).
In your case, if function_that_increments_i() definition is not available for the compiler to see, then it's unable to prove that it doesn't modify i, and must pessimistically/defensively assume that it does, and not apply a modification that has this prerequisite.
However, if the definition of function_that_increments_i() is visible, can be inlined, and it doesn't pass references to i to other functions it cannot track, then it may still be able to apply a series of inlining / reducing optimizations. Once the call to function_that_increments_i() is removed, it may become possible to eliminate (or change representation of) i.
The important thing is that it must maintain the same behavior.
Upvotes: 4
Reputation: 48938
No it wouldn't, because your loop ends after 10 iterations, but if the compiler optimizes away i
, then your loop is infinite. This is a violation of the informal as-if rule: Every optimization is allowed, as long as you can't notice that such an optimization took place.
Basically, if you have:
// sum is long = 0, n is int something
for (int i = 0; i <= n; ++i)
sum += i;
Then the compiler is allowed to rewrite it as:
sum = n + n(n-1)/2
Because you cannot notice the difference, as the two do the exact same thing, "as if" basically.
Upvotes: 1