Reputation: 18276
While trying to debug a problem I'm having using Speex, I noticed that it (well, not just Speex, but some example code as well) does the following:
It so happens that the definition of EncState
starts with a field of type SpeexMode *
, and so the integer values of a pointer to the first field and a pointer to the struct happen to be the same. The dereference happens to work at runtime.
But... does the language actually allow this? Is the compiler free to do whatever it wants if it compiles this? Is casting a struct T*
to a struct C*
undefined behavior, if T''s first field is a
C`?
Upvotes: 5
Views: 688
Reputation: 81149
Every version of the Standard has treated support for many aliasing constructs as a Quality of Implementation issue, since it would have been essentially impossible to write rules which supported all useful constructs, didn't block any useful optimizations, and could be supported by all compilers without significant rework. Consider the following function:
struct foo {int length; int *dat; };
int test1(struct foo *p)
{
int *ip = &p->length;
*ip = 2;
return p->length;
}
I think it's rather clear that any quality compiler should be expected to handle the possibility that an object of type struct foo
might be affected by the assignment to *ip
. On the other hand, consider the function:
void test2(struct foo *p)
{
int i;
for (i=0; i < p->length; i++)
p->dat[i] = 0;
}
Should a compiler be required to make allowances for the possibility that writing to p->dat[i]
might affect the value of p->length
, e.g. by reloading the value of p->length
after at least the first iteration of the loop?
I think some members of the Committee may have intended to require that compilers make such allowance, but I don't think they all did, and the rules as written wouldn't require it since they list the types of lvalue that may be used to access an object of type struct foo
, and int
is not among them. Some people may think the omission was accidental, but I think it was based on an expectation that compilers would interpret the rule as requiring that objects which are accessed as some particular type in some context be accessed by lvalues which have a visible association with an object of one of the listed types, within that context. The question of what constitutes a "visible association" left as a QoI issue outside the Standard's jurisdiction, but compiler writers were expected to make reasonable efforts to recognize associations when practical.
Within a function like test1
, an lvalue of type p
is used to derive ip
, and p
is not used in any other fashion to access p->length
between the formation of ip
and its last usage. Thus, compilers should have no difficulty recognizing that a store to *ip
cannot be reordered across the later read to p->length
, even without a general rule giving blanket permission to use pointers of type int*
to access int
members of unrelated structures. Within test2
, however, there is no visible means by which the address of p->length
could have been used in the computation of pointer p->dat
, and thus it would be reasonable for optimizing compilers intended for most common purposes to hoist the read of p->length
before the loop in the expectation that its value won't change.
Rather than making any effort to recognize the types of object from which a pointer is derived, clang and gcc instead opt to behave as though the Standard gives general permission to access struct (but not union!) members using pointers of their types. This is allowable but not required by the Standard (a conforming but garbage quality implementation could process test1
in arbitrary meaningless fashion), but the blindness to pointer derivation needlessly restricts the range of constructs available to programmers, and makes it necessary to forego what should be useful optimizations such as those exemplified by test2()
.
Overall, the correct answer to almost any question related to aliasing in C is "that's a quality-of-implementation issue". Observations about what clang and gcc do may be useful for people who need to appease the -fstrict-aliasing
mode of those compilers, but have little to do with what the Standard actually says.
Upvotes: -2
Reputation: 59811
From the C11 standard:
(C11 §6.7.2.1.15: "A pointer to a structure object, suitably converted, points to its initial member ... and vice versa. There may be unnamed padding within as structure object, but not at its beginning.")
Which means that the behavior you see is allowed and guaranteed.
Upvotes: 8