Reputation: 182
assume we have the following code
int test(bool* flag, int* y)
{
if(*y)
{
*flag = false;
else
{
*flag = true;
}
}
note that the compiler can prove here that writing to flag will always happen, so I think the following one is allowed(which I don't think is optimization at all, but just for the example)
int test(bool* flag, int* y)
{
*flag = true;
if(*y)
{
*flag = false;
}
}
so now, we write true to flag also if y!=0
, but from the point of the as-if rule, this looks valid.
but I still think, that this optimization is weird, assume that *y = true
always, so the flag is always false, so if some other thread reads the flag variable, he may see true, although it should never happen, so does it break the as-if rule?
is the optimization valid?
Addition: the case of non-atomic is clear since it's UB, and all bets are off, but if the flag is atomic with relaxed ordering, what then?
Upvotes: 0
Views: 319
Reputation: 214780
int test(bool* flag, int* y)
{
*flag = true;
As-is, it is dubious if this is an allowed optimization. The compiler is not allowed to optimize out or reorder what the standard calls side effects, defined in C17 5.1.2.3:
Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects ...
/--/
The presence of a sequence point between the evaluation of expressions A and B implies that every value computation and side effect associated with A is sequenced before every value computation and side effect associated with B.
In this case, a lvalue access to the object pointed at by flag
modifies an object - it is a side effect. However, reading from *y
is not a side effect, but writing the the same variable twice *flag = true;
... *flag = false;
would introduce a new side effect which wasn't there in the code.
But this is an artificial example. An optimizing compiler wouldn't do such optimizations on a "C level". Instead it would likely use a CPU register as temporary variable, in case freeing one up while setting it to zero would somehow be more efficient. So it wouldn't write to the actual location flag
but to a temporary register.
Even more likely, it would set a register to the value of the boolean expression *y != 0
then copy that register value 1 or 0 into *flag
. Notably if I change your C code to this:
int test(bool* flag, int* y)
{
*flag = *y != 0;
}
Then I get the very same machine code, given that optimizations are enabled. On gcc x86 -O3 it might look like:
mov eax, DWORD PTR [rsi]
test eax, eax
setne BYTE PTR [rdi]
ret
That is, copy flag
into register eax, then depending on if eax is set or not, set some status flag with test
and store the result 1 or 0 into test
depending on that result.
The main optimization to consider here is likely to generate "branch free" code, without any comparisons or conditional jumps etc, since that leads to efficient use of instruction cache on high-end CPUs such as x86.
Regarding multi-threading, none of this code is safe regardless of re-ordering, because a write to a C variable cannot be considered atomic unless explicitly qualified as such (_Atomic
etc).
Regarding strict aliasing as brought up in other answers, it isn't really relevant here. A bool*
and int*
are non-compatible pointer types and may not alias. The compiler cannot assume that a write to *flag
will change *y
or vice versa.
Upvotes: 0
Reputation: 154169
Given:
int test(bool* flag, int* y) {
if(*y)
The following (slightly different from OP's code) is not allowed
// int test(bool* flag, int* y) {
int test(bool* flag, bool* y) {
// or
int test(int* flag, int* y) {
*flag = true;
... as pointers to the same type of data may point to the same piece of data: i.e. the pointers are the same. Then the code change is not functionally the same.
If code was:
int test(same_type* restrict flag, same_type* restrict y) {
Then the compiler can assume that the pointers do not point to overlapping data, even though they point to the same type.
Upvotes: 1
Reputation: 76829
The transformation is valid (based only on what the standard(s) define, i.e. assuming strict aliasing).
It is impossible for any other thread to observe the intermediate value, because no thread is allowed to read *flag
while the function executes on another thread.
*flag
isn't an atomic object, the function always writes to *flag
and the other thread reading it unsynchronized therefore causes undefined behavior as it is a data race regardless of the path taken.
If flag
had type (std::)atomic_bool*
instead, then, regardless of memory ordering used, the transformation wouldn't generally be valid, because, as you said, it would become possible for another thread to observe a value that it shouldn't have been able to observe.
Upvotes: 9