bdesham
bdesham

Reputation: 16089

Serializing values to a string of bytes in a platform-independent way

I’m writing some serialization code that will work at a lower level than I’m used to. I need functions to take various value types (int32_t, int64_t, float, etc.) and shove them into a vector<unsigned char> in preparation for being written to a file. The file will be read and reconstituted in an analogous way.

The functions to write to the vector look like this:

void write_int32(std::vector<unsigned char>& buffer, int32_t value)
{
    buffer.push_back((value >> 24) & 0xff);
    buffer.push_back((value >> 16) & 0xff);
    buffer.push_back((value >> 8) & 0xff);
    buffer.push_back(value & 0xff);
}

void write_float(std::vector<unsigned char>& buffer, float value)
{
    assert(sizeof(float) == sizeof(int32_t));

    write_int32(buffer, *(int32_t *)&value);
}

These bit-shifting, type-punning atrocities seem to work, on the single machine I’ve used so far, but they feel extremely fragile. Where can I learn which operations are guaranteed to yield the same results across architectures, float representations, etc.? Specifically, is there a safer way to do what I’ve done in these two example functions?

Upvotes: 0

Views: 1105

Answers (3)

bdesham
bdesham

Reputation: 16089

I wanted something quick and lightweight so I whipped up a simple and stupid text serialization format. Each value is written to the file using something barely more complicated than

output_buffer << value << ' ';

Protocol Buffers would have worked okay but I was worried they’d take too long to integrate. XML’s verbosity would have been a problem for me—I need to serialize thousands of values and even having <a>...</a> wrapping each number would have added nearly a megabyte to each file. I tried MessagePack but it just seemed like an awkward fit with C++’s static typing. What I came up with isn’t clever but it works great.

Upvotes: 2

Jack O&#39;Reilly
Jack O&#39;Reilly

Reputation: 442

Usually the best way to do this is to employ an external library designed for this purpose -- it's all to easy to introduce platform disagreement bugs, especially when trying to transmit info like floating point types. There are multiple options for open-source software that does this. One example is Google Protocol Buffers, which in addition to being platform-neutral has the benefit of being language-independent (it generates code for use in serialization based on messages you define).

Upvotes: 1

Rob K
Rob K

Reputation: 8926

A human readable representation is the most safe. XML with an xsd is one option that can allow you to exactly specify value and format.

If you really want a binary representation, look at the hton* and ntoh* functions:

http://beej.us/guide/bgnet/output/html/multipage/htonsman.html

Upvotes: 2

Related Questions