Reputation: 16089
I’m writing some serialization code that will work at a lower level than I’m used to. I need functions to take various value types (int32_t
, int64_t
, float
, etc.) and shove them into a vector<unsigned char>
in preparation for being written to a file. The file will be read and reconstituted in an analogous way.
The functions to write to the vector look like this:
void write_int32(std::vector<unsigned char>& buffer, int32_t value)
{
buffer.push_back((value >> 24) & 0xff);
buffer.push_back((value >> 16) & 0xff);
buffer.push_back((value >> 8) & 0xff);
buffer.push_back(value & 0xff);
}
void write_float(std::vector<unsigned char>& buffer, float value)
{
assert(sizeof(float) == sizeof(int32_t));
write_int32(buffer, *(int32_t *)&value);
}
These bit-shifting, type-punning atrocities seem to work, on the single machine I’ve used so far, but they feel extremely fragile. Where can I learn which operations are guaranteed to yield the same results across architectures, float representations, etc.? Specifically, is there a safer way to do what I’ve done in these two example functions?
Upvotes: 0
Views: 1105
Reputation: 16089
I wanted something quick and lightweight so I whipped up a simple and stupid text serialization format. Each value is written to the file using something barely more complicated than
output_buffer << value << ' ';
Protocol Buffers would have worked okay but I was worried they’d take too long to integrate. XML’s verbosity would have been a problem for me—I need to serialize thousands of values and even having <a>...</a>
wrapping each number would have added nearly a megabyte to each file. I tried MessagePack but it just seemed like an awkward fit with C++’s static typing. What I came up with isn’t clever but it works great.
Upvotes: 2
Reputation: 442
Usually the best way to do this is to employ an external library designed for this purpose -- it's all to easy to introduce platform disagreement bugs, especially when trying to transmit info like floating point types. There are multiple options for open-source software that does this. One example is Google Protocol Buffers, which in addition to being platform-neutral has the benefit of being language-independent (it generates code for use in serialization based on messages you define).
Upvotes: 1
Reputation: 8926
A human readable representation is the most safe. XML with an xsd is one option that can allow you to exactly specify value and format.
If you really want a binary representation, look at the hton*
and ntoh*
functions:
http://beej.us/guide/bgnet/output/html/multipage/htonsman.html
Upvotes: 2