Reputation: 4796
According to this answer the following code invokes undefined behavior:
uint16_t *buf = malloc(16); // 8*sizeof(uint16_t)
buf[1] = *buf = some_value;
((uint32_t *)buf)[1] = *(uint32_t *)buf;
((uint64_t *)buf)[1] = *(uint64_t *)buf;
We may write any type to malloc()
memory but we may not read a previously written value as an incompatible type by casting pointers (with the exeption of char
).
Could I use this union:
union Data {
uint16_t u16[8];
uint32_t u32[4];
uint64_t u64[2];
};
As such:
union Data *buf = malloc(16);
buf->u16[1] = buf->u16[0] = some_value;
buf->u32[1] = buf->u32[0];
buf->u64[1] = buf->u64[0];
In order to avoid undefined behavior via strict aliasing violations? Also, could I cast buf
to any of uint16_t *
, uint32_t *
, uint64_t *
, and then dereference it without invoking undefined behavior, since these types are all valid members of union Data
? (i.e. is the following valid):
uint16_t first16bits = *(uint16_t *)buf;
uint32_t first32bits = *(uint32_t *)buf;
uint64_t first64bits = *(uint64_t *)buf;
If not (i.e. the above code making use of union Data
is still invalid), when can and cannot unions be used (in pointer casts or otherwise) to produce valid code that does not violate strict aliasing rules?
Upvotes: 3
Views: 170
Reputation: 81189
The construct someUnion.someArray[i]
is defined as meaning *(someUnion.someArray+i)
, with the latter being an access to an lvalue of the array element type that has no relation whatsoever to the union type.
C implementations will generally recognize a construct which is written using the array-bracket notation as having an association with the union type, even in cases where they would not do so if the construct were written using explicit pointer arithmetic syntax. Such special treatment for array-bracket notation, however, is purely up to the discretion of individual implementations.
On the flip side, a pointer to an object may only be converted to a pointer to a union type containing that object if the pointer satisfies the alignment requirements of all members within the union, without regard for whether the members in question are accessed. On platforms that do not support unaligned access, clang will process a construct like:
union quadbyte {
unsigned char bb[4];
unsigned short hh[2];
unsigned int ww[1];
};
#include <string.h>
unsigned test(union quadbyte *src)
{
return src->hh[0] | (src->hh[1] << 16);
}
in a manner that will fail if src
isn't properly aligned for type union quadbyte
, even if it would be properly aligned for type unsigned short
.
Upvotes: 2
Reputation: 224082
Yes, it is acceptable to write one union
member and read another. Section 6.5p7 of the C standard states:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type of the object,
- a type that is the signed or unsigned type corresponding to the effective type of the object,
- a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
- a character type
It is also safe to convert the address of a union
to that of any of its members. From section 6.7.2.1p16:
The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit- field, then to the unit in which it resides), and vice versa
Upvotes: 3