Reputation: 7005
I have been keeping a large set of data as TEXT records in a TEXT file:
yyyyMMddTHHmmssfff doube1 double2
However when I read it I need to parse each DateTime. This is quite slow for millions of records.
So, now I am trying it as a binary file which I created by serlializing my class.
That way I do not need to parse the DateTime.
class MyRecord
{
DateTime DT;
double Price1;
double Price2;
}
public byte[] SerializeToByteArray()
{
var bf = new BinaryFormatter();
using (var ms = new MemoryStream())
{
bf.Serialize(ms, this);
return ms.ToArray();
}
}
MyRecord mr = new MyRecord();
outBin = new BinaryWriter(File.Create(binFileName, 2048, FileOptions.None));
for (AllRecords) //Pseudo
{
mr = new MyRecord(); //Pseudo
outBin.Write(mr.SerializeToByteArray());
}
The resulting binary is on average 3 times the size of the TEXT file.
Is that to be expected?
EDIT 1
I am exploring using Protbuf to help me:
I want to do this with using USING to fit my existing structure.
private void DisplayBtn_Click(object sender, EventArgs e)
{
string fileName = dbDirectory + @"\nAD20120101.dat";
FileStream fs = File.OpenRead(fileName);
MyRecord tr;
while (fs.CanRead)
{
tr = Serializer.Deserialize<MyRecord>(fs);
Console.WriteLine("> "+ tr.ToString());
}
}
BUT after first record tr - full of zeroes.
Upvotes: 0
Views: 432
Reputation: 13794
As Requested by the OP.
the output is not a binary file it's binary serialization of instances plus an overhead of BinaryFormatter to allow deserialization later for this reason you get 3 times the file large than expected if you need a smart serialization solution you can take a look at ProtoBuf-net https://code.google.com/p/protobuf-net/
here you can find a link explaining how you can achieve this
[ProtoContract]
Public class MyRecord
{ [ProtoMember(1)]
DateTime DT;
[ProtoMember(2)]
double Price1;
[ProtoMember(3)]
double Price2;
}
Upvotes: 0
Reputation: 1500
You are not storing a simple binary version of your DateTime, but an object representing those. That is much larger then simply storing your Date as Text.
If you create a class
class MyRecords
{
DateTime[] DT;
double[] Price1;
double[] Price2;
}
And serialize that, it should be much smaller.
Also I guess DateTime still needs lots of space, so you can convert your DateTime to a Integer Unix Timestamp and store that.
Upvotes: 0
Reputation: 394044
Your archive likely has considerable overhead serializing type information with each record.
Instead, make the whole collection serializable (if it isn't already) and serialize that in one go.
Upvotes: 1