Reputation: 60067
The standard doesn't seem impose any padding requirements on struct members, even though it does prohibit reordering (6.7.2.1p6). How likely is it that a C platform will not pad minimally, i.e., not add only the minimum amount of padding needed to make sure the next member (or instance of the same struct, if this is the last member) is sufficiently aligned for its type?
Is it even sensible of the standard not to require that padding be minimal?
I'm asking because this lack of a padding guarantee seems to prevent me from portably representing serialized objects as structs (even if I limit myself to just uint8_t
arrays as members, compilers seem to be allowed to add padding in between them), and I'm finding it a little weird to have to resort to offset arithmetic there.
Upvotes: 1
Views: 129
Reputation: 39386
How likely is it that a C platform will not pad minimally, i.e., not add only the minimum amount of padding needed to make sure the next member (or instance of the same struct, if this is the last member) is sufficiently aligned for its type?
Essentially, the "extra" padding may allow significant compiler optimizations.
Unfortunately, I don't know if any compilers actually do that (and therefore cannot provide any estimate on its likelihood of occurring).
As a simple example, consider a 32-bit or 64-bit architecture, where the ABI states that string literals and character arrays are aligned to 32-bit or 64-bit boundary. Many of the C library functions are (also) implemented by the C compiler itself; see e.g. these lists for GCC. The compiler can track the parameters to see if they refer to a string literal or (the beginning of a) character array, and if so, replace e.g. strcmp()
with an optimized built-in version (which does the comparison in 32-bit units, rather than char-at-a-time).
As a more complicated example, consider a RISC hardware architecture, where unaligned byte access is slower than aligned native word access. (For example, the former may be implemented in hardware as the latter, followed by a bit shift.) Such an architecture could have an ABI that requires all structure members to be word-aligned. Then, the C compiler would be required to add more-than-minimal padding.
Traditionally, the C standards committee has been very careful to not exclude any kind of hardware architecture from correctly implementing the language.
Is it even sensible of the standard not to require that padding be minimal?
The purpose of the C standard used to be to ensure that C code would behave in the same manner if compiled with different compilers, and to allow implementation of the language on any sufficiently capable hardware architecture. In that sense, it is very sensible for the standard not to require minimal padding, as some ABIs may require more than minimal padding for whatever reason.
With the introduction of the Microsoft "extensions", the purpose of the C standard has shifted significantly, to binding C to C++ to ensure a C++ compiler can compile C code with minimal differences to C++ compilation, and to provide interfaces that can be marketed as "safer" with the actual purpose of balkanizing developers and binding them to a single vendor implementation. Because this is contrary to the previous purpose of the standard, and it is clearly non-sensible to standardize single-vendor functions like fscanf_s() while not standardizing multi-vendor functions like getline(), it may not be possible to define what sensible means anymore in the context of the C standard. It definitely does not match "good judgment"; it probably now refers to "being perceptible by the senses".
I'm asking because this lack of a padding guarantee seems to prevent me from portably representing serialized objects as structs
You are making the same mistake C programmers make, over and over again. Structs are not suitable for representing serialized objects. You should not use a struct to represent a network object, or a file header, because of the C struct rules.
Instead, you should use a simple character buffer, and either accessor functions (to extract or pack each member or field from the buffer), or conversion functions (to convert the buffer contents to a struct and vice versa).
The underlying reason why even experienced programmers like the asker still would prefer to use a struct instead, is that the accessors/conversion involves a lot of extra code; having the compiler do it instead would be much better: less code, simpler code, easier to maintain.
And I agree. It would even be quite straightforward, if a new keyword, say serialized_struct
was introduced; to introduce a serialized data structure with completely different member rules to traditional C structs. (Note that this support would not affect e.g. linking at all, so it really is not as complicated as one might think.) Additional attributes or keywords could be used to specify explicit byte order, and the compiler would do all the conversion details for us, in whatever way the compiler sees best for the specific architecture it compiler for. This support would only be available for new code, but it would be hugely beneficial in cutting down on interoperability issues -- and it would make a lot of serialization code simpler!
Unfortunately, when you combine the C standard committee's traditional dislike to adding new keywords, and the overall direction change from interoperability to vendor lock-in, there is no chance at all for anything like this to be included in the C standard.
Of course, as described in the comments, there are lots of C libraries that implement one serialization scheme or other. I've even written a few myself (for rather peculiar use cases, though). A sensible approach (poor pun intended) would be to pick a vibrant one (well maintained, with a lively community around the library), and use it.
Upvotes: 1