Mance Rayder
Mance Rayder

Reputation: 363

Structure over flexible array member

I'm writing a C program (g++ compilable) that has to deal with a lot of different structures, all coming from a buffer with a predefined format. The format specifies which type of structure I should load. This may be solved using unions, but the hugh difference in sizes of the structures made me decide for a structure with a void * in it:

struct msg {
    int type;
    void * data; /* may be any of the 50 defined structures: @see type */
};

The problem with that is that I need 2 malloc calls, and 2 free. For me, function calls are expensive and malloc is expensive. From the users side, it would be great to simple free the msgs. So I changed the definition to:

struct msg {
    int type;
    uint8_t data[]; /* flexible array member */
};
...
struct msg_10 {
    uint32_t flags[4];
    ...
};

Whenever I need to deserialize a message, I do:

struct msg * deserialize_10(uint8_t * buffer, size_t length) {
    struct msg * msg = (struct msg *) malloc(offsetof(struct msg, data) + sizeof(struct msg_10));
    struct msg_10 * payload = (__typeof__(payload))msg->data;

    /* load payload */
    return msg;
}

And to get a member of that structure:

uint32_t msg10_flags(const struct msg * msg, int i)
{
    return ((struct msg_10 *)(msg->data))->flags[i];
}

With this change, gcc (and g++) issue a nice warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] message.

I think this is a common issue (but I found no answer here) on how to represent a family of messages in C, in some efficient manner.

I understand why the warning appeared, my questions are the following:

  1. Is it possible to implement something like this, without the warning or is it intrinsically flawed? (the or is not exclusive :P, and I'm almost convinced I should refactor)
  2. Would it be better to represent the messages using something like the following code?

    struct msg {
        int type;
    };
    ...
    struct msg_10 {
        struct msg; /* or int type; */
        uint32_t flags[4];
        ...
    };
    
  3. If yes, caveats? Can I always write and use the following?

    struct msg * deserialize_10(uint8_t * buffer, size_t length) {
        struct msg_10 * msg = (struct msg_10 *) malloc(sizeof(struct msg_10));
    
        /* load payload */
        return (struct msg *)msg;
    }
    
    uint32_t msg10_flags(const struct msg * msg, int i) {
        const struct msg_10 * msg10 = (const struct msg_10 *) msg;
        return msg10->flags[i];
    }
    
  4. Any other?

I forgot to say that this runs on low level systems and performance is a priority but, all in all, the real issue is how to handle this "multi-message" structure. I may refactor once, but changing the implementation of the deserialization of 50 message types...

Upvotes: 4

Views: 919

Answers (4)

I'm writing a C program (g++ compilable)

This is a misunderstanding.

C source files should be compiled by gcc (not by g++). C++ source files should be compiled by g++ (not by gcc). Remember that GCC means Gnu Compiler Collection (and also contains gfortran and gccgo etc... when suitably configured). So Fortran source files should be compiled with gfortran (if using GCC), Go source files should be compiled with gccgo (if using GCC), Ada code should be compiled with gnat (if using GCC), and so on.

Read about Invoking GCC. Check what happens by passing also -v to your gcc or g++ compiler command (it should invoke the cc1 compiler, not the cc1plus one).

If you insist about compiling a C99 or C11 source file with g++ (not gcc), which is IMHO wrong and confusing, be sure to pass at least the -std=c99 (or -std=gnu11 etc...) and -x c flags.

But you really should fix your build automation or build procedure to use gcc (not g++) to compile C code. Your real issue is that point (some bug in a Makefile or some other thing).

At link time, use g++ if you mix C and C++ code.

Notice that flexible array members don't exist (and never existed) in C++, even in future C++20. In C++ you might use 0 as their declared size, e.g. code:

#ifdef __cplusplus
#define FLEXIBLE_DIM 0
#else
#define FLEXIBLE_DIM /*empty flexible array member*/
#endif

and then declare:

struct msg {
  int type;
  uint8_t data[FLEXIBLE_DIM]; /* flexible array member */
};

but this works only because uint8_t is a POD, and your g++ compiler might (sometimes) give "buffer overflow" or "index out of bounds" warnings (and you should never depend upon the compile-time sizeof of that data field).

Upvotes: 2

Gunther Schulz
Gunther Schulz

Reputation: 625

No mallocs, no frees, no aliasing, function calls are inline, for simple or non-padded naturally aligned structures, inline functions are equivalent to a memcpy or register copy for small simple structures. For more complex structures the compiler does all the heavy lifting.

Since you are deserializing, alignemnet of the byes in the buffer are likely packed and NOT naturally aligned. Take a look at the Linux Kernel source of packed_struct.h (https://elixir.bootlin.com/linux/v3.8/source/include/linux/unaligned/packed_struct.h)

Instead of u16, u32, u64, roll out a function for each of your msg_0..msg_10..msg_(n-1). As you can see from the source file, it is ripe for simplifying each unaligned type and inline function with a few simple macros. Using your example names

struct msg {
    int type;
};
...
struct msg_10 {
    struct msg MsgStruct; /* or int type; */
    uint32_t flags[4];
    ...
};

#define UNALIGNED_STRUCT_NAME(msg_struct_tag) \
    UNA_##msg_struct_tag

#define DECLARE_UNALIGNED_STRUCT(msg_struct_tag) \
  struct UNALIGNED_STRUCT_NAME(msg_struct_tag) \
  {struct msg_struct_tag x;} __attribute__((__packed__))

#define DESERIALIZE_FN_NAME(msg_struct_tag) \
    deserialize_##msg_struct_tag

#define CALL_DESERIALIZE_STRUCT_FN(msg_struct_tag, pbuf) \
    DESERIALIZE_FN_NAME(msg_struct_tag)(pbuf)

#define DEFINE_DESERIALIZE_STRUCT_FN(msg_struct_tag) \
    static inline \
        struct msg_struct_tag DESERIALIZE_FN_NAME(msg_struct_tag)(const void* p) \
    { \
        const struct UNALIGNED_STRUCT_NAME(msg_struct_tag) *ptr = \
            (const struct UNALIGNED_STRUCT_NAME(msg_struct_tag) *)p; \
        return ptr->x; \
    }

...
DECLARE_UNALIGNED_STRUCT(msg_9);
DECLARE_UNALIGNED_STRUCT(msg_10);
DECLARE_UNALIGNED_STRUCT(msg_11);
...
...
DEFINE_DESERIALIZE_STRUCT_FN(msg_9)
DEFINE_DESERIALIZE_STRUCT_FN(msg_10)
DEFINE_DESERIALIZE_STRUCT_FN(msg_11)
...

To deserialize a message 10

struct msg_10 ThisMsg = CALL_DESERIALIZE_STRUCT_FN(msg_10, buffer);

Or to deserialize a message 13 at byte 9 in the buffer

struct msg_13 OtherMsg = CALL_DESERIALIZE_STRUCT_FN(msg_13, &(buffer[9]));

Upvotes: 0

Lundin
Lundin

Reputation: 213940

To dodge the strict aliasing, you can wrap your struct inside a union. With C11 you can use an anonymous struct to get rid of the extra level needed to access "flags":

typedef union
{
  struct
  {
    uint32_t flags[4];
  };  
  uint8_t bytes[ sizeof(uint32_t[4]) ];
} msg_10;

And now you can do msg_10* payload = (msg_10*)msg->data; and access payload without worrying about strict aliasing violations, since the union type includes a type (uint8_t[]) compatible with the effective type of the object.

Please note however, that the pointer returned by malloc has no effective type until you access it through a pointer to a certain type. So alternatively, you can make sure to access the data with the correct type after malloc, and that won't give a strict aliasing violation either. Something like

struct msg_10 * msg = malloc(sizeof(struct msg_10));
struct msg_10 dummy = *msg; 

Where dummy won't be used, it is just there to set the effective type.

Upvotes: 3

Domso
Domso

Reputation: 970

You can certainly build something like this using makros; message_header works as a parent-struct for all message-types. Being the first member of such structs they share the same address. Hence after creating msg(int) and casting it to a message_header you can free it simply by calling free on it. (C++ works somewhat the same btw)

Is this what you wanted?

struct message_header {
    int type;
};

#define msg(T) struct {struct message_header header; T data} 

struct message_header* init_msg_int(int a) {
    msg(int)* result = (msg(int)*)(malloc(sizeof(msg(int))));
    result->data = a;
    return (struct message_header*)result;
}

int get_msg_int(struct message_header* msg) {
    return ((msg(int)*)msg)->data;
}

void free_msg(struct message_header* msg) {
    free(msg);
}    

Upvotes: 2

Related Questions