How do I check if a class file has been changed before serializing it?

Question

We have a custom serialization process for a large number of C# types. However, regenerating all serialization information for all classes/types is time consuming and we were planning on optimizing the serialization process by computing the hash of the file and if different, we generate the serialized output, else we skip it. EDIT: We can store the hashes in a Dictionary which could be output to a file and re-read when processing. That's the current idea.

Our current serialization processor works as follows - we add the types to be serialized to a repo:

SerializerRepo.Add(typeof(MyType)); //Add type to be serialized to a repo

And then (possibly elsewhere in code) have the serializer process the repo and output the custom XMLs etc.,

Serializer.WriteXML(SerializerRepo.GetTypes());

WriteXML goes through each type and spews out an XML file for each type at a particular location. I need to optimize the WriteXML method to only serialize the class/type if it has changed, else let it be.

This may not be the best way to do it and is open for refactoring suggestions. However, the current problem is how to ascertain if the class definition (or file) housing the class/type has changed in order to determine if the XML should be generated or not?

Since there is no inherent relation between the type and the corresponding class since a class can be partial, .Net doesn't have any such mapping from types to class file and vice versa. However, we don't have any partial classes. But in our case, we seem to need the two (albeit unrelated) pieces of information - the file housing the type/class and the type itself.

Two (possibly sub-optimal) ideas so far:

Either we have the user specify the file name along with the type. But that'd not be amenable to any kind of refactoring where the file name is changed.
Another solution is to manually read each .cs file and parse for public class and map it to every type. That seems like a huge overhead and not sure if it's a reliable way to do it.

These are the only two ideas that I have but nothing concrete. Suggestions?

Jerry Federspiel · Accepted Answer

Separate the generation of XML in-memory from persisting it to disk.

Keep a dictionary from fully-qualified class names to hashes. On your first run, the dictionary will start out empty.

When it is time to ensure that a class's corresponding XML is up to date on disk, generate its XML in-memory, hash that, and check the hash against the dictionary. If the class's name is not in the dictionary or if its hash disagrees with the hash in the dictionary, persist the generated XML and update the dictionary with the new hash.

After you've gone through this process with all your types, you'll have a full dictionary of hashes. Persist that to disk and load it the next time you run this program.

How do I check if a class file has been changed before serializing it?

Answers (1)

Related Questions