Reputation: 131162
What are the deficiencies of the built-in BinaryFormatter based .Net serialization? (Performance, flexibility, restrictions)
Please accompany your answer with some code if possible.
Example:
Custom objects being serialized must be decorated with the [Serializable] attribute or implement the ISerializable interface.
Less obvious example:
Anonymous types can not be serialized.
Upvotes: 18
Views: 5607
Reputation: 1064104
If you mean BinaryFormatter
:
event
s)I've spent lots of time in this area, including writing a (free) implementation of Google's "protocol buffers" serialization API for .NET; protobuf-net
This is:
ISerializable
(for remoting etc) and WCFUpvotes: 24
Reputation: 759
Another situation causes the BinaryFormatter to throw an exception.
[Serializable]
class SerializeMe
{
public List<Data> _dataList;
public string _name;
}
[Serializable]
class Data
{
public int _t;
}
Imagine SerializeMe gets serialized today. Tomorrow we decide we no longer need class Data and remove it. Accordingly, we modify the SerializeMe class to remove the List. It is now impossible to deserialize the old version of a SerializeMe object.
Solution is to either create a custom BinaryFormatter to properly ignore extra classes, or keep class Data with an empty definition (no need to keep the List member).
Upvotes: 0
Reputation: 11
I concur with the last answer. The performance is pretty poor.
Recently, my team of coders finished converting a simulation from standard C++ to C++/CLI. Under C++ we had a hand written persistance mechanism, which worked reasonably well. We decided to use the serialization mechanism, as opposed to re-writing the old persistance mechanism.
THe old simulation with a memory footprint between 1/2 and 1 Gig and most objects having pointers to other objects, and 1000's of objects at runtime, would persist to a binary file of about 10 to 15 Meg in under a minute. Restoring from the file was comparable.
Using the same data-files (running side-by-side) the running performance of the C++/CLI is about twice the C++, until we do the persistance (serialization in the new version) Writng out takes between 3 and 5 minutes, reading in takes between 10 and 20. The file size of the serialized files is about 5 times the size as the old files,
Basically we see a 19 fold increase in the read time, and a 5 fold increas in the write time. This is unacceptable and we are looking for ways to correct this.
In examining the binary files I discovered a few things: 1. The type and assembly data is written in clear text for all types. This is space-wise inefficient. 2. Every object /instance of every type has the bloated type/assembly information writen out. One thing that we did in our hand persistance mechansim was write out a known type table. As we discovered types in writing, we looked up its existance in this table. If it did not exist, an entry was created wiht the type info, and an index assigned. Then we passed the type infor as an integer. (type,data,type,data) This 'trick' would cut down on the size tremendously. This may require going through the data twice, however an 'on-the-fly' process could be developped, where-by in addition to adding it to the table, pushing to the stream, if we could guarentee order of resotration from the stream.
I was hoping to re-implement some of the core serialization to optimize it this way, but, alas the classes are sealed! We may yet find a way to jerry-rig it.
Upvotes: 1
Reputation: 131162
A slightly less obvious one is that performance is pretty poor for Object serialization.
Time to serialize and deserialize 100,000 objects on my machine:
Time Elapsed 3 ms
Full Serialization Cycle: BinaryFormatter Int[100000]
Time Elapsed 1246 ms
Full Serialization Cycle: BinaryFormatter NumberObject[100000]
Time Elapsed 54 ms
Full Serialization Cycle: Manual NumberObject[100000]
In this simple example serializing an object with a single Int field takes 20x slower than doing it by hand. Granted, there is some type information in the serialized stream. But that hardly accounts for the 20X slowdown.
Upvotes: 0
Reputation: 19800
Types being serialized must be decorated with the [Serializable] attribute.
If you mean variables in a class, you are wrong. Public variables/properties are automaticly serialized
Upvotes: 0
Reputation: 416131
Another issue that came to mind:
The XmlSerializer classes are located in a completely different place from the generic run time formatters. And while they are very similar to use, the XmlSerializer does not implement the IFormatter interface. You can't have code that allows you to simply swap the serialization formatter in or out at run time between BinaryFormatter, XmlSerializer, or a custom formatter without jumping through some extra hoops.
Upvotes: 1
Reputation: 18013
It isn't guaranteed you can serialize objects back and forth between different Frameworks (Say 1.0, 1.1, 3.5) or even different CLR Implementations (Mono), again, XML is better to this purpose.
Upvotes: 1
Reputation: 2134
Versioning of data is handled through attributes. If you aren't worried about versioning then this is no problem. If you are, it is a huge problem.
The trouble with the attribute scheme is that it works pretty slick for many trivial cases (such as adding a new property) but breaks down pretty rapidly when you try to do something like replace two enum values with a different, new enum value (or any number of common scenarios that comes with long-lived persistent data).
I could go into lots of details describing the troubles. In the end, writing your own serializer is pretty darn easy if you need to...
Upvotes: 2
Reputation: 12502
If you change the object you're serializing, all the old data you've serialized and stored is broken. If you stored in a database or even XML it is easier to convert old data to new.
Upvotes: 1
Reputation: 416131
Given any random object, it's very difficult to prove whether it really is serializable.
Upvotes: 3