Reputation: 2322
For example, can this
unsigned f(float x) {
unsigned u = *(unsigned *)&x;
return u;
}
cause unpredictable results on a platform where,
unsigned
and float
are both 32-bitunsigned
and float
can be stored to and loaded from the same part of memory.I know about strict aliasing rules, but most examples showing problematic cases of violating strict aliasing is like the following.
static int g(int *i, float *f) {
*i = 1;
*f = 0;
return *i;
}
int h() {
int n;
return g(&n, (float *)&n);
}
In my understanding, the compiler is free to assume that i
and f
are implicitly restrict
. The return value of h
could be 1
if the compiler thinks *f = 0;
is redundant (because i
and f
can't alias), or it could be 0
if it puts into account that the values of i
and f
are the same. This is undefined behaviour, so technically, anything else can happen.
However, the first example is a bit different.
unsigned f(float x) {
unsigned u = *(unsigned *)&x;
return u;
}
Sorry for my unclear wording, but everything is done "in-place". I can't think of any other way the compiler might interpret the line unsigned u = *(unsigned *)&x;
, other than "copy the bits of x
to u
".
In practice, all compilers for various architectures I tested in https://godbolt.org/ with full optimization produce the same result for the first example, and varying results (either 0
or 1
) for the second example.
I know it's technically possible that unsigned
and float
have different sizes and alignment requirements, or should be stored in different memory segments. In that case even the first code won't make sense. But on most modern platforms where the following holds, is the first example still undefined behaviour (can it produce unpredictable results)?
unsigned
and float
are both 32-bitunsigned
and float
can be stored to and loaded from the same part of memory.In real code, I do write
unsigned f(float x) {
unsigned u;
memcpy(&u, &x, sizeof(x));
return u;
}
The compiled result is the same as using pointer casting, after optimization. This question is about interpretation of the standard about strict aliasing rules for code such as the first example.
Upvotes: 1
Views: 266
Reputation: 81247
The Standard was never intended to fully, accurately, and unambiguously partition programs that have defined behavior and those that don't(*), but instead relies upon compiler writers to exercise a certain amount of common sense.
(*) If it was intended for that purpose, it fails miserably, as evidenced by the amount of confusion stemming from it.
Consider the following two code snippets:
/* Assume suitable declarations of u are available everywhere */
union test { uint32_t ww[4]; float ff[4]; } u;
/* Snippet #1 */
uint32_t proc1(int i, int j)
{
u.ww[i] = 1;
u.ff[j] = 2.0f;
return u.ww[i];
}
/* Snippet #2, part 1, in one compilation unit */
uint32_t proc2a(uint32_t *p1, float *p2)
{
*p1 = 1;
*p2 = 2.0f;
return *p1;
}
/* Snippet #2, part 2, in another compilation unit */
uint32_t proc2(int i, int j)
{
return proc2a(u.ww+i, u.ff+j);
}
It is clear that the authors of the Standard intended that the first version of the code be processed meaningfully on platforms where that would make sense, but it's also clear that at least some of the authors of C99 and later versions did not intend to require that the second version be processed likewise (some of the authors of C89 may have intended that the "strict aliasing rule" only apply to situations where a directly named object would be accessed via pointer of another type, as shown in the example given in the published Rationale; nothing in the Rationale suggests a desire to apply it more broadly).
On the other hand, the Standard defines the [] operator in such a fashion that proc1
is semantically equivalent to:
uint32_t proc3(int i, int j)
{
*(u.ww+i) = 1;
*(u.ff+j) = 2.0f;
return *(u.ww+i);
}
and there's nothing in the Standard that would imply that proc() shouldn't have the same semantics. What gcc and clang seem to do is special-case the []
operator as having a different meaning from pointer dereferencing, but nothing in the Standard makes such a distinction. The only way to consistently interpret the Standard is to recognize that the form with []
falls in the category of actions which the Standard doesn't require that implementations process meaningfully, but relies upon them to handle anyway.
Constructs such as yours example of using a directly-cast pointer to access storage associated with an object of the original pointer's type fall in a similar category of constructs which at least some authors of the Standard likely expected (and would have demanded, if they didn't expect) that compilers would handle reliably, with or without a mandate, since there was no imaginable reason why a quality compiler would do otherwise. Since then, however, clang and gcc have evolved to defy such expectations. Even if clang and gcc would normally generate useful machine code for a function, they seek to perform aggressive inter-procedural optimizations that make it impossible to predict what constructs will be 100% reliable. Unlike some compilers which refrain from applying potential optimizing transforms unless they can prove that they are sound, clang and gcc seek to perform transforms that can't be proven to affect program behavior.
Upvotes: 2
Reputation: 365517
As Kamil explains, it's UB. Even int
and long
(or long
and long long
) aren't alias-compatible even when they're the same size. (But interestingly, unsigned int
is compatible with int
)
It's nothing to do with being the same size, or using the same register-set as suggested in a comment, it's mainly a way to let compilers assume that different pointers don't point to overlapping memory when optimizing. They still have to support C99 union
type-punning, not just memcpy
. So for example a dst[i] = src[i]
loop doesn't need to check for possible overlap when unrolling or vectorizing, if dst and src have different types.1
If you're accessing the same integer data, the standard requires that you use the exact same type, modulo only things like signed
vs. unsigned
and const
. Or that you use (unsigned) char*
, which is like GNU C __attribute__((may_alias))
.
The other part of your question seems to be why it appears to work in practice, despite the UB.
Your godbolt link forgot to link the actual compilers you tried.
https://godbolt.org/z/rvj3d4e4o shows GCC4.1, from before GCC went out of its way to support "obvious" local compile-time-visible cases like this, to sometimes not break people's buggy code using non-portable idioms like this.
It loads garbage from stack memory, unless you use -fno-strict-aliasing
to make it movd
to that location first. (Store/reload instead of movd %xmm0, %eax
is a missed-optimization bug that's been fixed in later GCC versions for most cases.)
f: # GCC4.1 -O3
movl -4(%rsp), %eax
ret
f: # GCC4.1 -O3 -fno-strict-aliasing
movss %xmm0, -4(%rsp)
movl -4(%rsp), %eax
ret
Even that old GCC version warns warning: dereferencing type-punned pointer will break strict-aliasing rules
which should make it obvious that GCC notices this and does not consider it well-defined. Later GCC that do choose to support this code still warn.
It's debatable whether it's better to sometimes work in simple cases, but break other times, vs. always failing. But given that GCC -Wall
does still warn about it, that's probably a good tradeoff between convenience for people dealing with legacy code or porting from MSVC. Another option would be to always break it unless people use -fno-strict-aliasing
, which they should if dealing with codebases that depend on this behaviour.
Just the opposite; it would take tons of extra work to actually trap on every signed overflow in the C abstract machine, for example, especially when optimizing stuff like 2 + c - 3
into c - 1
. That's what gcc -fsanitize=undefined
tries to do, adding x86 jo
instructions after additions (except it still does constant-propagation so it's just adding -1
, not detecting temporary overflow on INT_MAX. https://godbolt.org/z/WM9jGT3ac). And it seems strict-aliasing is not one of the kinds of UB it tries to detect at run time.
See also the clang blog article: What Every C Programmer Should Know About Undefined Behavior
For example, MSVC always defines this aliasing behaviour, like GCC/clang/ICC do with -fno-strict-aliasing
. Of course, that doesn't change the fact that pure ISO C leaves it undefined.
It just means that on those specific C implementations, the code is guaranteed to work the way you want, rather than happening to do so by chance or by de-facto compiler behaviour if it's simple enough for modern GCC to recognize and do the more "friendly" thing.
Just like gcc -fwrapv
for signed-integer overflows.
#define QUALIFIER // restrict
void convert(float *QUALIFIER pf, const int *pi) {
for(int i=0 ; i<10240 ; i++){
pf[i] = pi[i];
}
}
Godbolt shows that with the -O3
defaults for GCC11.2 for x86-64, we get just a SIMD loop with movdqu
/ cvtdq2ps
/ movups
and loop overhead. With -O3 -fno-strict-aliasing
, we get two versions of the loop, and an overlap check to see if we can run the scalar or the SIMD version.
Is there actual cases where strict aliasing helps better code generation, in which the same cannot be achieved with
restrict
You might well have a pointer that might point into either of two int
arrays, but definitely not at any float
variable, so you can't use restrict
on it. Strict-aliasing will let the compiler still avoid spill/reload of float
objects around stores through the pointer, even if the float
objects are global vars or otherwise aren't provably local to the function. (Escape analysis.)
Or a struct node *
that definitely isn't the same type as the payload in a tree.
Also, most code doesn't use restrict
all over the place. It could get quite cumbersome. Not just in loops, but in every function that deals with pointers to structs. And if you get it wrong and promise something that's not true, your code's broken.
Upvotes: 2
Reputation: 141698
Is it always undefined behaviour to copy the bits of a variable through an incompatible pointer?
Yes.
The rule is https://port70.net/~nsz/c/c11/n1570.html#6.5p7 :
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type of the object,
- a type that is the signed or unsigned type corresponding to the effective type of the object,
- a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
- a character type.
The effective type of the object x
is float
- it is defined with that type.
unsigned
is not compatible with float
,unsigned
is not a qualified version of float
,unsigned
is not a signed or unsigned type of float
,unsigned
is not a signed or unsigned type corresponding to qualified version of float
,unsigned
is not an aggregate or union typeunsigned
is not a character type.The "shall" is violated, it is undefined behavior (see https://port70.net/~nsz/c/c11/n1570.html#4p2 ). There is no other interpretation.
We also have https://port70.net/~nsz/c/c11/n1570.html#J.2 :
The behavior is undefined in the following circumstances:
- An object has its stored value accessed other than by an lvalue of an allowable type (6.5).
Upvotes: 3