Writing an class/struct that changes frequently

Question

Summary:
I have a struct that is read/written to file.
This struct changes frequently, and this causes my read() function to get complex.

I need to find a good way to handle change while keeping the bug count low. Optimally, code should be make it easy for one to spot the changes between versions.

I have thought through a couple of patterns but I don't know if I have gone through all possible options.

As you will see, the code was mostly in C-like, but I am in the process of turning it into C++.

Details
As I said, my struct changes frequently (almost in every version of the program).

Some members are deleted, some members are added, some are made more complex. It is not a simple case where a new member appears the structure.

So far, changes to the struct have been handled like:

in version_1, I used a color map table:


struct Obj {
    int color_index;  
};  

void Read_Obj( File *f, Obj *o ) {
    f->read( f, &o->color_index );
}

void Write_Obj( File *f, Obj *o ) {
    f->write( f, o->color_index );
}

in the next version, I changed it into [r,g,b] form


struct Obj {
    int color_r;
    int color_g;
    int color_b;  
};  

void Read_Obj( File *f, Obj *o ) {

    if( f->version() == File::Version1 ) {
        int color_index;
        f->read( f, &color_index );
        ColorIndex_to_RGB( o, color_index ); // we used color maps back then
    }
    else {
        f->read( f, &o->color_r );
        f->read( f, &o->color_g );
        f->read( f, &o->color_b );
    }
}      

void Write_Obj( File *f, Obj *o ) {
    f->write( f, o->color_r );
    f->write( f, o->color_g );
    f->write( f, o->color_b );
}

[brief note]

Note here that I know could have used


void Read_Obj( File *f, Obj *o ) {

    if( f->version() == File::Version1 ) {
        Read_Obj_V1( f, o );
    }
    else {
        Read_Obj_V2( f, o );
    }
}

but that tends to code duplication between each of the sub-functions, since, in real life, only 1-2 out of ~20 members of the struct changes on each version. So, the other 18 lines remain the same.

Of course, I could change to this policy if for a good reason

[end of brief note]

Now these structs have become complicated and I need to convert them to a class, and work in a more object-oriented fashion.

I have seen a pattern where you use one class to read for each old version, and then convert the data to a newer class.


class Obj_v1 {
    int m_color_index;
    read( File *f ) {
        f->read( f, &m_color_index );
   }

   void convert_to( Obj * ) { /* code to convert the older object */  }
};

class Obj {
    int m_r;
    int m_g;
    int m_b;
    read( File *f ) {
        f->read( f, &m_r );
        f->read( f, &m_g );
        f->read( f, &m_b );
   }

};

void Read_Obj( File *f, Obj *o ) {

    if( f.version() == File::Version1 ) {
        Obj_v1 old();
        old.read( f );
        old.convert_to( o );
    }
    else {
        o.read( f );
    }
}      

void Write_Obj( File *f, Obj *o ) {
    o->write( f );
}

However, there are two strategies for dealing with change:

Strategy 1 : direct conversions

void Read_Obj( File *f, Obj *o ) {

    if( f->version() == File::Version1 ) {
        Obj_v1 old();
        old.read( f );
        old.convert_to( o );
    }
    else if( f->version() == File::Version2 ) {
        Obj_v2 old();
        old.read( f );
        old.convert_to( o );
    }
    else {
        o.read( f );
    }
}

Disadvantage:

This implies that you have to update the convert_to() of all Obj_vX classes each time you change the Obj class. Too many possibilities for bugs thrown in each time.

Benefit:

You are always able to fit an old concept (struct) to the new - compare with a cascaded strategy (next), where some information may be lost along the way, so it cannot be used.

Strategy 2 : cascaded conversions

void Read_Obj( File *f, Obj *o ) {

    Obj_v1 o1();
    Obj_v2 o2();

    if( f->version() == File::Version1 ) {
        o1.read( f );
        o1.convert_to( o2 );
        o2.convert_to( o );
    }
    else if( f->version() == File::Version2 ) {
        o2.read( f );
        o2.convert_to( o );
    }
    else {
        o.read( f );
    }
}

Disadvantages:

Some information may exist in v1, which was useless in v3, but v5 could make use of it; however, cascaded conversions have wiped out this data.
Older versions will tend to take longer to create objects.

Benefit:

You only have to write one convert_to() each time you change the Obj class. However, one bug in one of the converters in the line, could have more severe effects, and could wreck the consistency of the database. You have increased chances of finding such a bug, though.

Worries:

Could it be that conversion-after-conversion you get too much noise in objects of older versions, that they are wrong?

Question:

Are there any other patterns that do a better job at this ?
The ones of you that had some experience with my proposals, what do you think of my worries on the above implementations ?
Which are preferable solutions?

thank you so much

Dummy00001 · Accepted Answer

void Read_Obj( File *f, Obj *o ) {
if( f->version() == File::Version1 ) {

The if is so to say a hidden switch/case. And switch/case in C++ is generally interchangeable with polymorphism. Example:

struct Reader {
   virtual void Read_Obj( File *f, Obj *o ) = 0;
   /* methods to read further objects */
}

struct ReaderV1 : public Reader {
   void Read_Obj( File *f, Obj *o ) { /* ... */ };
   /* methods to read further objects */
}

struct ReaderV2 : public Reader {
   void Read_Obj( File *f, Obj *o ) { /* ... */ };
   /* methods to read further objects */
}

And then instantiate the appropriate Reader descendant after opening the file and detecting the version number. That way you would have only one file version check in the top level code, instead of polluting all of the low-level code with the checks.

If code is common between the file version, for convenience you can also put it into the base reader class.

I would strongly advise against the variant with class Obj_v1 and class Obj where the read() method belongs to the Obj itself. This way one easily end-up with circular dependencies and also it is a bad idea to make an object aware of its persistent presentation. IME (in my experience) it is better to have the 3rd party reader class hierarchy responsible for that. (As in the std::iostream vs. std::string vs. operator <<: stream doesn't know string, string doesn't know stream, only the opeartor << knows both.)

Otherwise, I personally do not see any big difference between your "Strategy 1" and "Strategy 2". They both use the convert_to() what I personally think is superficial. IME solution with the polymorphism should be used instead - automatically converting everything to the up-to-date version of the object class Obj, without the intermediate class Obj_v1 and class Obj_v2. Since with polymorphism you would have a dedicated read function for every version, ensuring proper object recreation from the read information is easy.

Are there any other patterns that do a better job at this? The ones of you that had some experience with my proposals, what do you think of my worries on the above implementations? Which are preferable solutions?

This is precisely what polymorphism was intended to address and how I generally do such tasks myself.

This is related to object serialization, but I have not seen a single serialization framework (my info is likely outdated) which was capable of supporting several version of the same class.

I personally did end up several times with the following serialization/deserialization class hierarchy:

abstract reader interface (very slim by definition)
utility classes implementing the reading and writing of the actual objects from/to the streams (fat, highly reusable code, was used for network transfers too)
versioned implementations of the reader interface (relatively slim, reuse the fat utility classes)
writer interface/class (I was always writing up-to-date version of the file. Versioning was using only during reading.)

Hope that helps.

Writing an class/struct that changes frequently

Answers (2)

Related Questions