Reputation: 379
Type punning
A form of pointer aliasing where two pointers and refer to the same location in memory but represent that location as different types. The compiler will treat both "puns" as unrelated pointers. Type punning has the potential to cause dependency problems for any data accessed through both pointers.
What is this article trying to say? What happens if I use it or not use it?
Upvotes: 30
Views: 20441
Reputation: 213286
As it says, type punning is when you have two pointers of different type, both pointing at the same location. Example:
// BAD CODE
uint32_t data;
uint32_t* u32 = &data;
uint16_t* u16 = (uint16_t*)&data;
*u16 = ... // de-referencing invokes undefined behavior
This code invokes undefined behavior in C++ (and C) since you aren't allowed to access the same memory location through pointers of non-compatible types (with a few special exceptions). This is informally called a "strict aliasing violation" since it violates the strict aliasing rule.
Another way of doing type punning is through unions:
// BAD C++ CODE
typedef union
{
uint32_t u32;
uint16_t u16 [2];
} my_type;
my_type mt;
mt.u32 = 1;
std::cout << mt.u16[0]; // access union data through another member, undefined behavior
This is also undefined behavior in C++ (but allowed and perfectly fine in C).
Upvotes: 28
Reputation: 81115
Type punning and aliasing are distinct but related concepts that some compiler writers seem unable to distinguish despite their being largely orthogonal.
Type punning refers to situations in which storage is written as one type and read as another type, typically for the purpose of allowing a value to be interpreted as a sequence of bits, allowing a sequence of bits to be interpreted as a value, or allowing a value to be used as another type whose representation matches, at least in the portion of interest. For example, the latter form of type punning may be useful in situations where one may have pointers to a variety of structure types, all of which share a Common Initial Sequence, and may need to operate on common-initial-sequence members of all of those structures despite the structures' different types. Note that even though the Standard includes explicit guarantees which would suggest that the latter form of type punning is supposed to be useful, compilers that confuse it with aliasing don't support such constructs.
Aliasing refers to a different concept in which storage is accessed using two or more simultaneously-active but seemingly-unrelated means, in ways that interact with each other. Given something like:
int test1(int *p1, int *p2)
{
*p1 = 1;
*p2 = 2;
return *p1;
}
if p1==p2
, then p1
and p2
will alias since p1
will be used to access the storage identified by p2
sometime between the creation and last use of p2
, in a context wherein p1
cannot have been created from p2
[it's possible that p1
might have been created from p2
before the function was called, but there's no way p1
could have been derived from p2
within the function]. Because the Standard allows aliasing between lvalues that identify the same type, however, the above construct would have defined behavior when p1==p2
, despite the fact that p1
and p2
alias.
On the other hand, given something like:
struct s1 {int x; };
struct s2 {int x; };
union s1s2 {struct s1 v1; struct s2 v2; } uarr[100];
int test1(int i, int j)
{
int temp;
{ struct s1 *p1 = &uarr[i].v1; temp = p1->x; }
if (temp)
{ struct s2 *p2 = &uarr[j].v2; p2->x = 1; }
{ struct s1 *p3 = &uarr[i].v1; temp = p3->x; }
return temp;
}
Here, the pointers p1
, p2
, and p3
have obviously-disjoint lifetimes and consequently are not simultaneously active and do alias each other. Each pointer is independently derived from uarr
, and the lifetime of each pointer will end prior to the next use of uarr
. Consequently, this code makes use of type punning to access the same storage as both a struct s1
and a struct s2
, but as written does not exploit aliasing since all the accesses to the storage in question are visibly derived from the same root-level object uarr
.
Unfortunately, even though type-based access rules were intended (according to both the Rationale and a footnote) to indicate when things are allowed to alias, some compilers interpret them in ways that make language features such as the Common Initial Sequence guarantee essentially useless, since they use the type-access rules as an excuse to rewrite the code in such a way as to remove the derivation of p3
from uarr
, thus introducing aliasing where there had been none.
Upvotes: 12
Reputation: 29
There are perfectly good reasons to use punning. Imagine you want to transmit data over a serial link but the data is actually a packed structure of different types. The packed structure is sent as a BYTE array, but to display the data which is of different types...
int main(void)
{
unsigned char a[10] = {1,2,3,4,5,6,7,8,9,0};
unsigned int x,y,z;
x = *(unsigned int*) a;
y = *(unsigned int*) (a+1);
z = *((unsigned int*) a+1);
printf("x = %08X, y = %08X, z = %08X\n",x,y,z);
return 0;
}
Answer: x = 04030201, y = 05040302, z = 08070605
Note that this is little endian (LSB in lower memory)
Upvotes: 2