Reputation: 691
What most people are concerned about is what happens if they receive a byte array with data and they want to cast the array to a struct pointer - this can violate strict aliasing rules. I'm not sure whether initializing an empty byte array of sufficient size, casting it to a struct pointer, and then populate the struct members would violate the strict aliasing rules.
The details: Say I have 2 packed structs:
#pragma pack(1)
typedef struct
{
int a;
char b[2];
uint16_t c : 8;
uint16_t d : 7;
uint16_t e : 1;
} in_t;
typedef struct
{
int x;
char y;
} out_t;
#pragma pack()
I have many types of in/out packed structs for different messages so please ignore the specific members I put for the example. The structs can contain bitfields, other structs, and unions. Also, endianess is taken care of. Also, I can't use new c standards (>= c99) features.
I'm receiving a buffer containing in_t
(the buffer is large enough to contain out_t
, however big it'll be) as void *
void recv_msg(void *data)
{
in_t *in_data = (in_t*)data;
out_t *out_data = (out_t*)data;
// ... do something with in_data then set values in out_t.
// make sure values aren't overwritten.
}
Now I have a new type of in struct
#pragma pack(1)
typedef struct
{
int a;
char b[3];
uint32_t c;
} in_new_api_t;
typedef struct
{
int x;
char y[2];
} out_new_api_t;
#pragma pack()
Now, when moving to the new api but keeping the old api for backward compatibility, I want to copy values from the old in_t
to in_new_api_t
, use in_new_api_t
, set values in out_new_api_t
, and then copy the values to out_t
.
The way I thought of doing it is to allocate an empty byte array the size of max(sizeof(in_new_api_t), sizeof(out_new_api_t));
, cast it to in_new_api_t *
, translate values from in_t
to in_new_api_t
, send the new api struct to the new api function, then translate values from out_new_api_t
to out_t
.
void recv_msg(void *data)
{
uint8_t new_api_buf[max(sizeof(in_new_api_t), sizeof(out_new_api_t))] = {0};
in_new_api_t *new_in_data = (in_new_api_t*)new_api_buf;
in_t *in_data = (in_t*)data;
// ... copy values from in_data to new_in_data
// I'M NOT SURE I CAN ACCESS MEMBERS OF new_in_data WITHOUT VIOLATING STRICT ALIASING RULES.
new_in_data->a = in_data->a;
memcpy(new_in_data->b, in_data->b, 2);
// ...
new_recv_msg((void*)new_in_data);
out_new_api_t *new_out_data = (out_new_api_t*)new_api_buf;
out_t *out_data = (out_t*)data;
// ... copy values from new_out_data to out_data
}
The point I'm just not sure about is whether casting from 'uint8_t []' to 'in_new_api_t *' would violate the strict aliasing rules or cause any other issues. Also Access performance issues are a concern.
And if so, what is the best solution?
I can make copies of in_t and out_t
and make in_new_api_t
point to data
but then I need to copy the data 4 times to make sure I'm not overwriting values: from data
to in_t tmp_in
, from tmp_in
to in_new_api,
then from out_new_api
to out_t tmp_out
and from tmp_out
to out_t out
.
Upvotes: 2
Views: 1369
Reputation: 214465
It is fairly straight-forward:
void*
is of any different type, is a strict aliasing violation.So your code looks wildly unsafe and is also a bit confusing because of the void pointer. So number one is to get rid of that icky, dangerous void pointer! You can create a type such as:
typedef union
{
in_t old;
in_new_api_t new;
uint8_t bytes [sizeof(in_new_api_t)];
} in_api_t;
Then use this as parameter to your function.
This will first of all allow you to access the initial parts of each struct in a safe manner that doesn't violate aliasing (6.5.2.3, the rule about common initial sequence). That is, the members a
and b
will correspond to each other in both structs. The only thing you can't rely on is the members that aren't the same - those will have to be copied explicitly with memcpy.
Second, you can now use the bytes
member when you need to serialize the data. If you write the "out" structures as unions in a similar manner, and they too contain a bytes
member of exactly the same size, you can safely cast from one type to the other, without strict aliasing violations. This is allowed by C11 6.5:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- a type compatible with the effective type of the object
/--/
- an aggregate or union type that includes one of the aforementioned types among its members
If your union is accessed by a pointer to union type, that includes a byte array of exactly the same size (a compatible type), then that's allowed.
Upvotes: 2
Reputation: 2898
What you are doing in recv_msg()
clearly is undefined behaviour and will likely break your code some day, as the compiler is entitled to do whatever it wants when moving from *in_data
to *out_data
. Also, if the void* data
argument doesn't come from either a malloc()
(and cousins) or from an object that originally was an in_t
then you have UB and alignment problems even there.
Your method to save RAM is extremely risky. Even if you are bold enough to ignore the more theoretical UB case of accessing memory with an illegal but correctly aligned type, you still will get problems as there simply is no guarantee that the order of operations of copying in-place from one struct to the other won't trash your data.
Upvotes: 1
Reputation: 15144
It sounds like what you want is a couple of union
types. The common initial sequences of the struct
members of a union
are layout-compatible, per the standard, and can be mapped onto each other, in exactly the same way as the family field of every sockaddr_*
type. Type-punning on a union is legal in C, but not in C++ (although it works with POD on every compiler, no compiler that tries to be compatible with existing code will ever break it, and any possible alternative is undefined behavior too). This might possibly obviate the need for a copy.
A union
is guaranteed to be properly-aligned for both. If you do use pointers, it is probably a good idea to Alignas
the object to both types, just in case.
A memcpy()
to and from arrays of unsigned char
is also legal; the language standards call the contents of the array after the copy the object representation.
Upvotes: 2