cecil
cecil

Reputation: 453

Does the stb lib violate Strict Aliasing rules in C?

I'm interested in the technique used by Sean Barrett to make a dynamic array in C for any type. Comments in the current version claims the code is safe to use with strict-aliasing optimizations: https://github.com/nothings/stb/blob/master/stb_ds.h#L332

You use it like:

int *array = NULL;
arrput(array, 2);
arrput(array, 3);

The allocation it does holds both the array data + a header struct:

typedef struct
{
  size_t      length;
  size_t      capacity;
  void      * hash_table;
  ptrdiff_t   temp;
} stbds_array_header;

The macros/functions all take a void* to the array and access the header by casting the void* array and moving back one:

#define stbds_header(t)  ((stbds_array_header *) (t) - 1)

I'm sure Sean Barrett is far more knowledgeable than the average programmer. I'm just having trouble following how this type of code is not undefined behavior because of the strict aliasing rules in modern C. If this does avoid problems I'd love to understand why it does so I can incorporate it myself (maybe with a few less macros).

Upvotes: 3

Views: 498

Answers (1)

KamilCuk
KamilCuk

Reputation: 140880

Lets follow the expansions of arrput in https://github.com/nothings/stb/blob/master/stb_ds.h :

#define STBDS_REALLOC(c,p,s) realloc(p,s)

#define arrput      stbds_arrput

#define stbds_header(t)  ((stbds_array_header *) (t) - 1)

#define stbds_arrput(a,v)      (stbds_arrmaybegrow(a,1), (a)[stbds_header(a)->length++] = (v))

#define stbds_arrmaybegrow(a,n)  ((!(a) || stbds_header(a)->length + (n) > stbds_header(a)->capacity) \
                                  ? (stbds_arrgrow(a,n,0),0) : 0)

#define stbds_arrgrow(a,b,c)   ((a) = stbds_arrgrowf_wrapper((a), sizeof *(a), (b), (c)))

#define stbds_arrgrowf_wrapper            stbds_arrgrowf

void *stbds_arrgrowf(void *a, size_t elemsize, size_t addlen, size_t min_cap)
{
      ...
      b = STBDS_REALLOC(NULL, (a) ? stbds_header(a) : 0, elemsize * min_cap + sizeof(stbds_array_header));
      //if (num_prev < 65536) prev_allocs[num_prev++] = (int *) (char *) b;
      b = (char *) b + sizeof(stbds_array_header);
      if (a == NULL) {
        stbds_header(b)->length = 0;
        stbds_header(b)->hash_table = 0;
        stbds_header(b)->temp = 0;
      } else {
        STBDS_STATS(++stbds_array_grow);
      }
      stbds_header(b)->capacity = min_cap;

      return b;
}

how this type of code is not undefined behavior because of the strict aliasing

Strict aliasing is about accessing data that has different effective type than data stored there. I would argue that the data stored in the memory region pointed to by stbds_header(array) has the effective type of the stbds_array_header structure, so accessing it is fine. The structure members are allocated by realloc and initialized one by one inside stbds_arrgrowf by stbds_header(b)->length = 0; lines.

how this type of code is not undefined behavior

I think the pointer arithmetic is fine. You can say that the result of realloc points to an array of one stbds_array_header structure. In other words, when doing the first stbds_header(b)->length = inside stbds_arrgrowf function the memory returned by realloc "becomes" an array of one element of stbds_array_header structures, as If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access from https://port70.net/~nsz/c/c11/n1570.html#6.5p6 .

int *array is assigned inside stbds_arrgrow to point to "one past the last element of an array" of one stbds_array_header structure. (Well, this is also the same place where an int array starts). ((stbds_array_header *) (array) - 1) calculates the address of the last array element by subtracting one from "one past the last element of an array". I would rewrite it as (char *)(void *)t - sizeof(stbds_array_header) anyway, as (stbds_array_header *) (array) sounds like it would generate a compiler warning.

Assigning to int *array in expansion of stbds_arrgrow a pointer to (char *)result_of_realloc + sizeof(stbds_array_header) may very theoretically potentially be not properly aligned to int array type, breaking If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined from https://port70.net/~nsz/c/c11/n1570.html#6.3.2.3p7 . This is very theoretical, as stbds_array_header structure has size_t and void * and ptrdiff_t members, in any normal architecture it will have good alignment to access int (or any other normal type) after it.

I have only inspected the code in expansions of arrput. This is a 2000 lines of code, there may be other undefined behavior anywhere.

Upvotes: 2

Related Questions