Gil-Mor
Gil-Mor

Reputation: 691

casting from empty byte array to struct pointer can violate strict aliasing?

What most people are concerned about is what happens if they receive a byte array with data and they want to cast the array to a struct pointer - this can violate strict aliasing rules. I'm not sure whether initializing an empty byte array of sufficient size, casting it to a struct pointer, and then populate the struct members would violate the strict aliasing rules.

The details: Say I have 2 packed structs:

#pragma pack(1)

typedef struct
{
    int a;
    char b[2];
    uint16_t c : 8;
    uint16_t d : 7;
    uint16_t e : 1;
} in_t;

typedef struct
{
    int x;
    char y;
} out_t;

#pragma pack()

I have many types of in/out packed structs for different messages so please ignore the specific members I put for the example. The structs can contain bitfields, other structs, and unions. Also, endianess is taken care of. Also, I can't use new c standards (>= c99) features.

I'm receiving a buffer containing in_t (the buffer is large enough to contain out_t, however big it'll be) as void *

void recv_msg(void *data)
{
    in_t *in_data = (in_t*)data;
    out_t *out_data = (out_t*)data;
    // ... do something with in_data then set values in out_t. 
    // make sure values aren't overwritten.
}

Now I have a new type of in struct

#pragma pack(1)

typedef struct
{
    int a;
    char b[3];
    uint32_t c;
} in_new_api_t;

typedef struct
{
    int x;
    char y[2];
} out_new_api_t;

#pragma pack()

Now, when moving to the new api but keeping the old api for backward compatibility, I want to copy values from the old in_t to in_new_api_t, use in_new_api_t, set values in out_new_api_t, and then copy the values to out_t.

The way I thought of doing it is to allocate an empty byte array the size of max(sizeof(in_new_api_t), sizeof(out_new_api_t));, cast it to in_new_api_t *, translate values from in_t to in_new_api_t, send the new api struct to the new api function, then translate values from out_new_api_t to out_t.

void recv_msg(void *data)
{
    uint8_t new_api_buf[max(sizeof(in_new_api_t), sizeof(out_new_api_t))] = {0};
    in_new_api_t *new_in_data = (in_new_api_t*)new_api_buf;

    in_t *in_data = (in_t*)data;

    // ... copy values from in_data to new_in_data
    // I'M NOT SURE I CAN ACCESS MEMBERS OF new_in_data WITHOUT VIOLATING STRICT ALIASING RULES. 

    new_in_data->a = in_data->a;
    memcpy(new_in_data->b, in_data->b, 2);
    // ...

    new_recv_msg((void*)new_in_data);

    out_new_api_t *new_out_data = (out_new_api_t*)new_api_buf;

    out_t *out_data = (out_t*)data;

    // ... copy values from new_out_data to out_data

}

The point I'm just not sure about is whether casting from 'uint8_t []' to 'in_new_api_t *' would violate the strict aliasing rules or cause any other issues. Also Access performance issues are a concern.

And if so, what is the best solution?

I can make copies of in_t and out_t and make in_new_api_t point to data but then I need to copy the data 4 times to make sure I'm not overwriting values: from data to in_t tmp_in, from tmp_in to in_new_api, then from out_new_api to out_t tmp_out and from tmp_out to out_t out.

Upvotes: 2

Views: 1369

Answers (3)

Lundin
Lundin

Reputation: 214465

It is fairly straight-forward:

  • Casting to a pointer-to-struct type, when the pointed-at data by the void* is of any different type, is a strict aliasing violation.
  • Casting to a pointer-to-struct from a pointer to raw character buffer is a strict aliasing violation. (You may however go the other way around: from pointer-to-struct into pointer-to-char.)

So your code looks wildly unsafe and is also a bit confusing because of the void pointer. So number one is to get rid of that icky, dangerous void pointer! You can create a type such as:

typedef union
{
  in_t          old;
  in_new_api_t  new;
  uint8_t       bytes [sizeof(in_new_api_t)];
} in_api_t;

Then use this as parameter to your function.

This will first of all allow you to access the initial parts of each struct in a safe manner that doesn't violate aliasing (6.5.2.3, the rule about common initial sequence). That is, the members a and b will correspond to each other in both structs. The only thing you can't rely on is the members that aren't the same - those will have to be copied explicitly with memcpy.

Second, you can now use the bytes member when you need to serialize the data. If you write the "out" structures as unions in a similar manner, and they too contain a bytes member of exactly the same size, you can safely cast from one type to the other, without strict aliasing violations. This is allowed by C11 6.5:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- a type compatible with the effective type of the object
/--/
- an aggregate or union type that includes one of the aforementioned types among its members

If your union is accessed by a pointer to union type, that includes a byte array of exactly the same size (a compatible type), then that's allowed.

Upvotes: 2

Vroomfondel
Vroomfondel

Reputation: 2898

What you are doing in recv_msg() clearly is undefined behaviour and will likely break your code some day, as the compiler is entitled to do whatever it wants when moving from *in_data to *out_data. Also, if the void* data argument doesn't come from either a malloc() (and cousins) or from an object that originally was an in_t then you have UB and alignment problems even there.

Your method to save RAM is extremely risky. Even if you are bold enough to ignore the more theoretical UB case of accessing memory with an illegal but correctly aligned type, you still will get problems as there simply is no guarantee that the order of operations of copying in-place from one struct to the other won't trash your data.

Upvotes: 1

Davislor
Davislor

Reputation: 15144

It sounds like what you want is a couple of union types. The common initial sequences of the struct members of a union are layout-compatible, per the standard, and can be mapped onto each other, in exactly the same way as the family field of every sockaddr_* type. Type-punning on a union is legal in C, but not in C++ (although it works with POD on every compiler, no compiler that tries to be compatible with existing code will ever break it, and any possible alternative is undefined behavior too). This might possibly obviate the need for a copy.

A union is guaranteed to be properly-aligned for both. If you do use pointers, it is probably a good idea to Alignas the object to both types, just in case.

A memcpy() to and from arrays of unsigned char is also legal; the language standards call the contents of the array after the copy the object representation.

Upvotes: 2

Related Questions