Reputation: 23
I've done a lot of reading recently on reinterpret_cast
as I want to ensure I'm using it correctly and not accidentally invoking undefined behavior. I feel like cppreference and this great writeup on strict aliasing has me 95% of the way there, but I wanted some clarification on my understanding of what is, and is not UB.
Let's say I have a struct:
struct __attribute__((packed)) SimpleStruct {
uint32_t a = 0;
uint8_t b = 1;
int16_t c = 2;
uint8_t d[5] = {0, 1, 2, 3, 4};
};
I've used the __attribute__((packed))
directive to ensure no padding bytes are used, to the detriment of performance/optimizations. Per the standard, examining the byte representation via a reinterpret_cast
to unsigned char *
of the object is allowed, and not UB:
unsigned char *bytes_of_simple_struct = reinterpret_cast<unsigned char *>(&simple_struct);
Now, and this is the part I wanted clarification on, I believe modifying bytes of the struct via this pointer is also allowed, and not UB (assuming you obey the size of the object):
static_assert(sizeof(simple_struct) == 12);
bytes_of_simple_struct[0] = 0x1U;
Now, I understand that what the value of simple_struct.a
will be depends on endianness of the system. However, accessing simple_struct.a
post this modification of bytes is still defined behavior correct? Because as long as I haven't modified the bytes to be an invalid representation of the type they make up, behavior should still be defined.
Conversely, if my struct had a bool
instead:
struct __attribute__((packed)) SimpleStruct {
bool a_bool = false;
uint8_t b = 1;
int16_t c = 2;
uint8_t d[5] = {0, 1, 2, 3, 4};
};
Then doing something like this:
bytes_of_simple_struct[0] = 0xFFU;
assert(simple_struct.a_bool == false);
Would be invoking UB, since I've now modified the underlying bytes of a_bool
such that there is not valid representation of type bool
. Basically, as long as any byte modification still obeys the rules for what bytes can represent each type, behavior should be defined? And in the case of the basic numeric types, you can essentially modify the bytes to anything (whether or not this is useful is another story), as any byte value is a valid uint8_t
, any two bytes are a valid uint16_t
, etc...
Is my understanding correct?
Upvotes: 0
Views: 106
Reputation: 23
For anyone who may find this helpful, a nice summary on what I was missing about undefined behavior in the form of some commented code:
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <cassert>
struct __attribute__((packed)) SimpleStruct
{
bool a_bool = false;
uint8_t b = 1;
int16_t c = 2;
uint8_t d[6] = { 0, 1, 2, 3, 4, 5 };
};
int
main ()
{
SimpleStruct simple_struct{};
// Ensuring padding has indeed been removed from the struct with __attribute__((packed))
static_assert (sizeof (simple_struct) == 10);
// Defined behavior, casting to unsigned char (or std::byte in C++20) to view the byte representation of an object is allowed
unsigned char *bytes_of_simple_struct =
reinterpret_cast <unsigned char *>(&simple_struct);
for (int i = 0; i < sizeof (simple_struct); i++)
{
printf("Byte %d of struct: %02X\n", i, bytes_of_simple_struct[i]);
}
// Defined behavior, using memcpy() to copy bytes into an object is allowed
static_assert(sizeof(bool) == 1);
uint8_t byte_array[sizeof(SimpleStruct)] = {
0x00U, // Critical this is either 0x00U or 0x01U, the only two valid byte representations for type bool
0x00U,
0x00U,
0x00U,
0x00U,
0x00U,
0x00U,
0x00U,
0x00U,
0x00U,
};
static_assert(sizeof(simple_struct) == sizeof(byte_array));
memcpy(&simple_struct, byte_array, sizeof(simple_struct));
assert(simple_struct.b == 0);
// Undefined behavior! This violates strict aliasing, because:
// - bytes_of_simple_struct[1] - Undefined! We've now de-referenced the unsigned char *,
// but the unsigned char * actually points at a SimpleStruct!
// Assigning values as if it was an unsigned char is undefined.
bytes_of_simple_struct[1] = 0xFFU;
assert(simple_struct.b = 0xFFU);
// A subtly different way to assign a single byte to bytes_of_simple_struct[1] that is defined.
// While it looks similar, the entire reason this is "defined" is because memcpy is not
// interpreting simple_struct as any type, it is simply copying bytes from one memory
// location to another.
unsigned char a_byte = 0xFFU;
memcpy(bytes_of_simple_struct + 1, &a_byte, sizeof(a_byte));
assert(simple_struct.b = 0xFFU);
// However, extreme care must be taken to ensure that the byte representation of the
// type being copied into is still valid post memcpy(). If the byte representation isn't
// valid, undefined behavior still occurs. For example:
a_byte = 0xFFU;
memcpy(bytes_of_simple_struct, &a_byte, sizeof(a_byte));
// A bool can only be represented by bytes 0x0 and 0x1, by copying 0xFF into a bool type
// and referencing simple_struct.a_bool, undefined behavior is invoked.
// For example, these assertions both pass compiled with GCC 13.2!
assert(simple_struct.a_bool != false);
assert(simple_struct.a_bool != true)
}
Upvotes: 0