user2138149
user2138149

Reputation: 17276

Should `alignof` be used when serializing data to a buffer in C++?

Recently I learned that when reading and writing data from a memory address, the data must be aligned correctly to avoid potential issues with undefined behaviour.

On some platforms (for example x86) rather than undefined behaviour, there is a performance penalty. This is caused by the compiler producing code which contains multiple loads in place of what would have been a single load for correctly aligned data.

Further, I understand that in many cases, the required alignment is the same width as the datatype. However, this is not a requirement or a rule and there may be exceptions to this.

My understanding is that the alignof operator can be used to correctly align data. This is similar to how sizeof can be used to correctly allocate memory size for data.

I want to write some data serialization and deserialization code to read and write data into a buffer. This data will be sent via a network socket between multiple machines. In this case, it would be reasonable to assume that the machines will all have the same endianness to avoid the overhead of needing to discuss converting between host and network byte order.

The question is how to do this?

For example, if I wanted to send an arbitrary sequence of data, how should I use alignof, or even should I use alignof to ensure that the code is not at risk of undefined behaviour?

To provide a concrete example, it may be the case that I might want to serialize a uint64_t, followed by a int8_t, followed by a 32 bit float.

The naieve way to do it is to write the 8 bytes of the uint64_t followed by the single byte for the int8_t followed by the 4 bytes for the float.

However, I think that while the first two elements will be correctly aligned, the final float will certainly not be.

Upvotes: 1

Views: 102

Answers (1)

robermorales
robermorales

Reputation: 3553

The data must be aligned correctly to avoid potential issues with undefined behaviour.

That is not true, generally. You should be able to write safe programs, without undefined behaviour, without dealing at all with alignment. If you have any specific case against this idea, or any specific compiler/architecture that does not hold to this, please post it.

On some platforms (for example x86) rather than undefined behaviour, there is a performance penalty.

Yes, some datatypes are faster to be loaded into registers if they are properly aligned

I understand that in many cases, the required alignment is the same width as the datatype

Yes, and the reason is the same than above. If that helps, you can imagine that somehow, registers are also "aligned", and so moving several bytes to some register is faster if the alignments match.

I want to write some data serialization and deserialization code to read and write data into a buffer. This data will be sent via a network socket between multiple machines. In this case, it would be reasonable to assume that the machines will all have the same endianness to avoid the overhead of needing to discuss converting between host and network byte order.

There are entire books dedicated to binary serialization formats, and yes, alignment, endianness, and precission are key factors to them. My actual answer, if you are in fact presented with the challenge to send data over the network, is to stick with any already established cross-language binary protocol.

Examples:

  1. protobuf https://protobuf.dev/
  2. thrift https://thrift.apache.org/
  3. Binary JSON https://bsonspec.org/

Upvotes: 0

Related Questions