Reputation: 17276
Recently I learned that when reading and writing data from a memory address, the data must be aligned correctly to avoid potential issues with undefined behaviour.
On some platforms (for example x86) rather than undefined behaviour, there is a performance penalty. This is caused by the compiler producing code which contains multiple loads in place of what would have been a single load for correctly aligned data.
Further, I understand that in many cases, the required alignment is the same width as the datatype. However, this is not a requirement or a rule and there may be exceptions to this.
My understanding is that the alignof
operator can be used to correctly align data. This is similar to how sizeof
can be used to correctly allocate memory size for data.
I want to write some data serialization and deserialization code to read and write data into a buffer. This data will be sent via a network socket between multiple machines. In this case, it would be reasonable to assume that the machines will all have the same endianness to avoid the overhead of needing to discuss converting between host and network byte order.
The question is how to do this?
For example, if I wanted to send an arbitrary sequence of data, how should I use alignof
, or even should I use alignof
to ensure that the code is not at risk of undefined behaviour?
To provide a concrete example, it may be the case that I might want to serialize a uint64_t
, followed by a int8_t
, followed by a 32 bit float
.
The naieve way to do it is to write the 8 bytes of the uint64_t
followed by the single byte for the int8_t
followed by the 4 bytes for the float
.
However, I think that while the first two elements will be correctly aligned, the final float
will certainly not be.
Upvotes: 1
Views: 102
Reputation: 3553
The data must be aligned correctly to avoid potential issues with undefined behaviour.
That is not true, generally. You should be able to write safe programs, without undefined behaviour, without dealing at all with alignment. If you have any specific case against this idea, or any specific compiler/architecture that does not hold to this, please post it.
On some platforms (for example x86) rather than undefined behaviour, there is a performance penalty.
Yes, some datatypes are faster to be loaded into registers if they are properly aligned
I understand that in many cases, the required alignment is the same width as the datatype
Yes, and the reason is the same than above. If that helps, you can imagine that somehow, registers are also "aligned", and so moving several bytes to some register is faster if the alignments match.
I want to write some data serialization and deserialization code to read and write data into a buffer. This data will be sent via a network socket between multiple machines. In this case, it would be reasonable to assume that the machines will all have the same endianness to avoid the overhead of needing to discuss converting between host and network byte order.
There are entire books dedicated to binary serialization formats, and yes, alignment, endianness, and precission are key factors to them. My actual answer, if you are in fact presented with the challenge to send data over the network, is to stick with any already established cross-language binary protocol.
Examples:
Upvotes: 0