Andreas Grapentin
Andreas Grapentin

Reputation: 5796

Casting structure pointers between structs containing pointers to different types?

I have a structure, defined by as follows:

struct vector
{
  (TYPE) *items;
  size_t nitems;
};

where type may literally be any type, and I have a type-agnostic structure of similar kind:

struct _vector_generic
{
  void *items;
  size_t nitems;
};

The second structure is used to pass structures of the first kind of any type to a resizing function, for example like this:

struct vector v;
vector_resize((_vector_generic*)&v, sizeof(*(v->items)), v->nitems + 1);

where vector_resize attempts to realloc memory for the given number of items in the vector.

int
vector_resize (struct _vector_generic *v, size_t item_size, size_t length)
{
  void *new = realloc(v->items, item_size * length);
  if (!new)
    return -1;

  v->items = new;
  v->nitems = length;

  return 0;
}

However, the C standard states that pointers to different types are not required to be of the same size.

6.2.5.27:

A pointer to void shall have the same representation and alignment requirements as a pointer to a character type.39) Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements. All pointers to structure types shall have the same representation and alignment requirements as each other. All pointers to union types shall have the same representation and alignment requirements as each other. Pointers to other types need not have the same representation or alignment requirements.

Now my question is, should I be worried that this code may break on some architectures?

Can I fix this by reordering my structs such that the pointer type is at the end? for example:

struct vector
{
  size_t nitems;
  (TYPE) *items;
};

And if not, what can I do?

For reference of what I am trying to achieve, see:
https://github.com/andy-graprof/grapes/blob/master/grapes/vector.h

For example usage, see:
https://github.com/andy-graprof/grapes/blob/master/tests/grapes.tests/vector.exp

Upvotes: 4

Views: 1690

Answers (3)

davmac
davmac

Reputation: 20631

The code in your question invokes undefined behaviour (UB), because you de-reference a potentially invalid pointer. The cast:

(_vector_generic*)&v

... is covered by 6.3.2.3 paragraph 7:

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer.

If we assume alignment requirements are met, then the cast does not invoke UB. However, there is no requirement that the converted pointer must "compare equal" with (i.e. point at the same object as) the original pointer, nor even that it points to any object at all - that is to say, the value of the pointer is unspecified - therefore, to dereference this pointer (without first ascertaining that it is equal to the original) invokes undefined behaviour.

(Many people who know C well find this odd. I think this is because they know a pointer cast usually compiles to no operation - the pointer value simply remains as it is - and therefore they see pointer conversion as purely a type conversion. However, the standard does not mandate this).

Even if the pointer after conversion did compare equal with the original pointer, 6.5 paragraph 7 (the so-called "strict aliasing rule") would not allow you to dereference it. Essentially, you cannot access the same object via two pointers with different type, with some limited exceptions.

Example:

struct a { int n; };
struct b { int member; };

struct a a_object;
struct b * bp = (struct b *) &a_object; // bp takes an unspecified value

// Following would invoke UB, because bp may be an invalid pointer:
// int m = b->member;

// But what if we can ascertain that bp points at the original object?:
if (bp == &a_object) {
    // The comparison in the line above actually violates constraints
    // in 6.5.9p2, but it is accepted by many compilers.
    int m = b->member;   // UB if executed, due to 6.5p7.
}

Upvotes: 1

Lundin
Lundin

Reputation: 213711

Lets for the sake of discussion ignore that the C standard formally says this is undefined behavior. Because undefined behavior simply means that something is beyond the scope of the language standard: anything can happen and the C standard makes no guarantees. There may however be "external" guarantees on the particular system you are using, made by those who made the system.

And in the real world where there is hardware, there are indeed such guarantees. There are just two things that can go wrong here in practice:

  • TYPE* having a different representation or size than void*.
  • Different struct padding in each struct type because of alignment requirements.

Both of these seem unlikely and can be dodged with a static asserts:

static void ct_assert (void) // dummy function never linked or called by anyone
{
  struct vector v1;
  struct _vector_generic v2;

  static_assert(sizeof(v1.items) == sizeof(v2.items), 
                "Err: unexpected pointer format.");
  static_assert(sizeof(v1) == sizeof(v2), 
                "Err: unexpected padding.");
}

Now the only thing left that could go wrong is if a "pointer to x" has same size but different representation compared to "pointer to y" on your specific system. I have never heard of such a system anywhere in the real world. But of course, there are no guarantees: such obscure, unorthodox systems may exist. In that case, it is up to you whether you want to support them, or if it will suffice to just have portability to 99.99% of all existing computers in the world.

In practice, the only time you have more than one pointer format on a system is when you are addressing memory beyond the CPU's standard address width, which is typically handled by non-standard extensions such as far pointers. In all such cases, the pointers will have different sizes and you will detect such cases with static assert above.

Upvotes: 0

2501
2501

Reputation: 25752

You code is undefined.

Accessing an object using an lvalue of an incompatible type results in undefined behavior.

Standard defines this in:

6.5 p7:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

— a type compatible with the effective type of the object,

— a qualified version of a type compatible with the effective type of the object,

— a type that is the signed or unsigned type corresponding to the effective type of the object,

— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,

— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

— a character type.

struct vector and struct _vector_generic have incompatible types and do not fit into any of the above categories. Their internal representation is irrelevant in this case.

For example:

struct vector v;
_vector_generic* g = &v;
g->size = 123 ;   //undefined!

The same goes for you example where you pass the address of the struct vector to the function and interpret it as a _vector_generic pointer.

The sizes and padding of the structs could also be different causing elements to be positioned at different offsets.

What you can do is use your generic struct, and cast if depending on the type the void pointer holds in the main code.

struct gen
{
    void *items;
    size_t nitems;
    size_t nsize ;
};

struct gen* g = malloc( sizeof(*g) ) ;
g->nitems = 10 ;
g->nsize = sizeof( float ) ;
g->items = malloc( g->nsize * g->nitems ) ;
float* f = g->items ;
f[g->nitems-1] = 1.2345f ;
...

Using the same struct definition you can allocate for a different type:

struct gen* g = malloc( sizeof(*g) ) ;
g->nitems = 10 ;
g->nsize = sizeof( int ) ;
g->items = malloc( g->nsize * g->nitems ) ;
int* i = g->items ;
...

Since you are storing the size of the type and the number of elements, it is obvious how your resize function would look like( try it ).

You will have to be careful to remember what type is used in which variable as the compiler will not warn you because you are using void*.

Upvotes: 2

Related Questions