Reputation: 31
After more than two decades of C++ programming, I have finally reached a point where I do not understand which types of pointer reinterpret-casts yield well-defined behaviour and which ones result in undefined behaviour due to strict aliasing rules...
I will be referring to the pointer-inverconvertibility clause from the C++ standard, in my case the C++17 standard, because this is the latest standard version that I have access to, but feel free to refer to any newer version. So, in section 6.9.2 Compound Types of the C++17 standard, paragraph 4 states:
Two objects a and b are pointer-interconvertible if:
- (4.1) - they are the same object, or
- (4.2) - one is a standard-layout union object and the other is a non-static data member of that object (12.3), or
- (4.3) - one is a standard-layout class object and the other is the first non-static data member of that object, or, if the object has no non-static data members, the first base class subobject of that object (12.2), or
- (4.4) - there exists an object c such that a and c are pointer-interconvertible, and c and b are pointerinterconvertible.
If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_cast (8.2.10) . [Note: An array object and its first element are not pointer-interconvertible, even though they have the same address. — end note ]
Now let's consider the following piece of example code:
#include <iostream>
struct Foo
{
int x;
int y;
};
struct Bar
{
int a[2];
};
struct Qux
{
union
{
int a[2];
int dummy;
};
};
int main()
{
int* primes = new int[4];
primes[0] = 2;
primes[1] = 3;
primes[2] = 5;
primes[3] = 7;
Foo* foo = reinterpret_cast<Foo*>(primes);
std::cout << "Foo[1].y = " << foo[1].y << std::endl;
Bar* bar = reinterpret_cast<Bar*>(primes);
std::cout << "Bar[1].a[1] = " << bar[1].a[1] << std::endl;
Qux* qux = reinterpret_cast<Qux*>(primes);
std::cout << "qux[1].a[1] = " << qux[1].a[1] << std::endl;
delete [] primes;
}
I am now trying to figure out, which one (if any) of the three reinterpret_cast
calls and their subsequent console outputs yield well-defined behaviour.
According to my understanding of the above quoted section of the standard, the cast Foo* foo = reinterpret_cast<Foo*>(primes)
should be well-defined, because the first non-static data member of Foo
is an object of type int
and therefore by (4.3) Foo
and int
should be pointer-interconvertible.
On the other hand, the cast Bar* bar = reinterpret_cast<Bar*>(primes)
should result in undefined behaviour, because the first non-static data member of Bar
is an array of int
, which is not pointer-interconvertible to int
according to the note.
Now let's take a look at the third cast Qux* qux = reinterpret_cast<Qux*>(primes)
: the first non-static data member of Qux
is an anonymous union, and so by (4.3) Qux
and that anonymous union are pointer-interconvertible. Furthermore, the non-static data member dummy
of the anonymous union is and int
object, and therefore by (4.2) the anonymous union is pointer-interconvertible with int
. Finally, by applying (4.4), Qux
is therefore pointer-interconvertible with int
and therefore the cast should be well-defined.
Now, since both foo
and qux
should be well-defined, accessing the fourth array element of primes, which stores the value 7, should be safe via both foo[1].y
as well as qux[1].a[1]
, right?
Am I right, that the above code does not break strict aliasing rules, since in all of the above examples, the objects accessed via the class members are all of type int
and therefore no type punning is happening?
Edit: 2024-08-19
Thanks to the comments, I realize that the pointer-interconvertibility clause from the C++ standard, which I cited above, relates to objects rather than types and is therefore not applicable in the code example I posted above.
However, the question which of the above reinterpret-casts results in undefined behaviour and which ones are well-defined still remains and I have found more ammunition in the standard to argue that at least the indirections via Foo
and Bar
should not result in undefined behaviour.
If we look at the definition of aggregate in section 11.6.1 Aggregates [dcl.init.aggr] it states in the first paragraph:
An aggregate is an array or a class (Clause 12) with
- (1.1) - no user-provided, explicit, or inherited constructors (15.1),
- (1.2) - no private or protected non-static data members (Clause 14),
- (1.3) - no virtual functions (13.3), and
- (1.4) - no virtual, private, or protected base classes (13.1).
and in the following second paragraph
The elements of an aggregate are:
- (2.1) - for an array, the array elements in increasing subscript order, or
- (2.2) - for a class, the direct base classes in declaration order, followed by the direct non-static data members (12.2) that are not members of an anonymous union, in declaration order.
So, according to the above defintions, at least the two classes Foo
and Bar
should formally count as aggregates.
Now, in section 6.10 Lvalues and rvalues [basic.lval] of the C++17 standard we have the famous paragraph 8, which is usually cited when it comes to answering questions related to type-punning:
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
- (8.1) - the dynamic type of the object,
- (8.2) - a cv-qualified version of the dynamic type of the object,
- (8.3) - a type similar (as defined in 7.5) to the dynamic type of the object,
- (8.4) - a type that is the signed or unsigned type corresponding to the dynamic type of the object,
- (8.5) - a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
- (8.6) - an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
- (8.7) - a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
- (8.8) - a char, unsigned char, or std:: byte type.
Now both Foo
and Bar
should fall under the scenario (8.6), so at least to my understanding of the above rules it should be well defined to access the individual int
objects of the primes
array via pointers of type Foo*
and Bar*
, right?
Upvotes: 3
Views: 99