Reputation: 13313
I have tried to read carefully all the advice given in the C++FAQ on this subject. I have implemented my system according to item 36.8 and now after few months (with a lot of data serialized), I want to make changes in both public interface of some of the classes and the inheritance structure itself.
class Base
{
public:
Vector field1() const;
Vector field2() const;
Vector field3() const;
std::string name() const {return "Base";}
};
class Derived : public Base
{
public:
std::string name() const {return "Derived";}
};
I would like to know how to make changes such as:
Split Derived
into Derived1
and Derived2
, while mapping the original Derived
into Derived1
for existing data.
Split Base::field1()
into Base::field1a()
and Base::field1b()
while mapping field1
to field1a
and having field1b
empty for existing data.
I will have to
I would like to know how to make the serialization more flexible, so that when I decide to make some change in the future, I would not be facing conversion hell like now.
I thought of making a system that would use numbers instead of names to serialize my objects. That is for example Base = 1, Derived1 = 2, ... and a separate number-to-name system that would convert numbers to names, so that when I want to change the name of some class, I would do it only in this separate number-to-name system, without changing the data.
The problems with this approach are:
The system would be brittle. That is changing anything in the number-to-name system would possibly change the meaning of gigabytes of data.
The serialized data would lose some of its human readability, since in the serialized data, there would be numbers instead of names.
I am sorry for putting so many issues into one question, but I am inexperienced at programming and the problem I am facing seems so overwhelming that I just do not know where to start.
Any general materials, tutorials, idioms or literature on flexible serialization is most welcomed!
Upvotes: 1
Views: 192
Reputation: 153967
It's probably a bit late for that now, but whenever designing a serialization format, you should provide for versionning. This can be mangled into the type information in the stream, or treated as a separate (integer) field. When writing the class out, you always write the latest version. When reading, you have to read both the type and the version before you can construct; if you're using the static map suggested in the FAQ, then the key would be:
struct DeserializeKey
{
std::string type;
int version;
};
Given the situation you are in now, the solution is probably to
mangle the version into the type name in a clearly recognizable
way, say something along the lines of
type_name__version
; if the
type_name
isn't followed by two underscore,
then use 0. This isn't the most efficient method, but it's
usually acceptable, and will solve the problem with backwards
compatibility, while providing for evolution in the future.
For your precise questions:
In this case, Derived
is just a previous version of
Derived1
. You can insert the necessary factory function into
the map under the appropriate key.
This is just classical versionning. Version 0 of Base
has
a field1
attribute, and when you deserialize, you use it to
initialize field1a
, and you initialize field1b
empty.
Version 2 of Base
has both.
If you mangle the version into the type name, as I suggest above, you shouldn't have to convert any existing data. Long term, of course, either some of the older versions simply disappear from your data sets, so that you can remove the support for them, or your program keeps getting bigger, with support for lots of older versions. In practice, I've usually seen the latter.
Upvotes: 2