Reputation: 2426
Summary:
I have a struct that is read/written to file.
This struct changes frequently, and this causes my read()
function to get complex.
I need to find a good way to handle change while keeping the bug count low. Optimally, code should be make it easy for one to spot the changes between versions.
I have thought through a couple of patterns but I don't know if I have gone through all possible options.
As you will see, the code was mostly in C
-like, but I am in the process of turning it into C++
.
Details
As I said, my struct changes frequently (almost in every version of the program).
So far, changes to the struct have been handled like:
struct Obj {
int color_index;
};
void Read_Obj( File *f, Obj *o ) {
f->read( f, &o->color_index );
}
void Write_Obj( File *f, Obj *o ) {
f->write( f, o->color_index );
}
struct Obj {
int color_r;
int color_g;
int color_b;
};
void Read_Obj( File *f, Obj *o ) {
if( f->version() == File::Version1 ) {
int color_index;
f->read( f, &color_index );
ColorIndex_to_RGB( o, color_index ); // we used color maps back then
}
else {
f->read( f, &o->color_r );
f->read( f, &o->color_g );
f->read( f, &o->color_b );
}
}
void Write_Obj( File *f, Obj *o ) {
f->write( f, o->color_r );
f->write( f, o->color_g );
f->write( f, o->color_b );
}
[brief note]
Note here that I know could have used
void Read_Obj( File *f, Obj *o ) {
if( f->version() == File::Version1 ) {
Read_Obj_V1( f, o );
}
else {
Read_Obj_V2( f, o );
}
}
but that tends to code duplication between each of the sub-functions, since, in real life, only 1-2 out of ~20 members of the struct changes on each version. So, the other 18 lines remain the same.
Of course, I could change to this policy if for a good reason
[end of brief note]
Now these structs have become complicated and I need to convert them to a class, and work in a more object-oriented fashion.
I have seen a pattern where you use one class to read for each old version, and then convert the data to a newer class.
class Obj_v1 {
int m_color_index;
read( File *f ) {
f->read( f, &m_color_index );
}
void convert_to( Obj * ) { /* code to convert the older object */ }
};
class Obj {
int m_r;
int m_g;
int m_b;
read( File *f ) {
f->read( f, &m_r );
f->read( f, &m_g );
f->read( f, &m_b );
}
};
void Read_Obj( File *f, Obj *o ) {
if( f.version() == File::Version1 ) {
Obj_v1 old();
old.read( f );
old.convert_to( o );
}
else {
o.read( f );
}
}
void Write_Obj( File *f, Obj *o ) {
o->write( f );
}
However, there are two strategies for dealing with change:
Strategy 1 : direct conversions
void Read_Obj( File *f, Obj *o ) {
if( f->version() == File::Version1 ) {
Obj_v1 old();
old.read( f );
old.convert_to( o );
}
else if( f->version() == File::Version2 ) {
Obj_v2 old();
old.read( f );
old.convert_to( o );
}
else {
o.read( f );
}
}
Disadvantage:
convert_to()
of all Obj_vX
classes each time you change the Obj
class. Too many possibilities for bugs thrown in each time.Benefit:
Strategy 2 : cascaded conversions
void Read_Obj( File *f, Obj *o ) {
Obj_v1 o1();
Obj_v2 o2();
if( f->version() == File::Version1 ) {
o1.read( f );
o1.convert_to( o2 );
o2.convert_to( o );
}
else if( f->version() == File::Version2 ) {
o2.read( f );
o2.convert_to( o );
}
else {
o.read( f );
}
}
Disadvantages:
Some information may exist in v1, which was useless in v3, but v5 could make use of it; however, cascaded conversions have wiped out this data.
Older versions will tend to take longer to create objects.
Benefit:
convert_to()
each time you change the Obj
class. However, one bug in one of the converters in the line, could have more severe effects, and could wreck the consistency of the database. You have increased chances of finding such a bug, though. Worries:
Question:
Are there any other patterns that do a better job at this ?
The ones of you that had some experience with my proposals, what do you think of my worries on the above implementations ?
Which are preferable solutions?
thank you so much
Upvotes: 4
Views: 416
Reputation: 299960
You may be able to put Google Protocol Buffers to work.
The main idea beyond protobuf is to decorrelate the actual serialization from the class information, because you create a class dedicated to the serialization... but the real benefit lies elsewhere.
The information encoded by protobuf is naturally both backward and forward compatible, so you if you add information and decode an old file: the new information won't be there. On the other hand, if you remove information, it'll skip it during the decoding.
This means that you leave the version handling to protobuf (without any real version number in fact) and then when changing your class:
It may also help you think better about what to save and in which format, it is okay to transform the data before saving it (encoding) and transform it back when reading (decoding), so the actual format of the save should change less frequently (you would add items, but you should not have to refactor the already encoded data too frequently).
Upvotes: 2
Reputation: 17420
void Read_Obj( File *f, Obj *o ) {
if( f->version() == File::Version1 ) {
The if
is so to say a hidden switch/case. And switch/case in C++ is generally interchangeable with polymorphism. Example:
struct Reader {
virtual void Read_Obj( File *f, Obj *o ) = 0;
/* methods to read further objects */
}
struct ReaderV1 : public Reader {
void Read_Obj( File *f, Obj *o ) { /* ... */ };
/* methods to read further objects */
}
struct ReaderV2 : public Reader {
void Read_Obj( File *f, Obj *o ) { /* ... */ };
/* methods to read further objects */
}
And then instantiate the appropriate Reader descendant after opening the file and detecting the version number. That way you would have only one file version check in the top level code, instead of polluting all of the low-level code with the checks.
If code is common between the file version, for convenience you can also put it into the base reader class.
I would strongly advise against the variant with class Obj_v1
and class Obj
where the read()
method belongs to the Obj
itself. This way one easily end-up with circular dependencies and also it is a bad idea to make an object aware of its persistent presentation. IME (in my experience) it is better to have the 3rd party reader class hierarchy responsible for that. (As in the std::iostream
vs. std::string
vs. operator <<
: stream doesn't know string, string doesn't know stream, only the opeartor <<
knows both.)
Otherwise, I personally do not see any big difference between your "Strategy 1" and "Strategy 2". They both use the convert_to()
what I personally think is superficial. IME solution with the polymorphism should be used instead - automatically converting everything to the up-to-date version of the object class Obj
, without the intermediate class Obj_v1
and class Obj_v2
. Since with polymorphism you would have a dedicated read function for every version, ensuring proper object recreation from the read information is easy.
Are there any other patterns that do a better job at this? The ones of you that had some experience with my proposals, what do you think of my worries on the above implementations? Which are preferable solutions?
This is precisely what polymorphism was intended to address and how I generally do such tasks myself.
This is related to object serialization, but I have not seen a single serialization framework (my info is likely outdated) which was capable of supporting several version of the same class.
I personally did end up several times with the following serialization/deserialization class hierarchy:
Hope that helps.
Upvotes: 3