lukebuehler
lukebuehler

Reputation: 4081

How to squeeze the most performance out of probuf-net

I want to use protobuf-net to serialize stock market data. I'm playing around with following message model:

1st message: Meta Data describing what data to expect and some other info.
2nd message: DataBegin
3rd message: DataItem
4th message: DataItem
...
nth message: EndData

Here's an example of a Data Item:

class Bar{
   DateTime DateTime{get;set;}
   float Open{get;set}
   float High{get;set}
   float Low{get;set}
   float Close{get;set}
   intVolume{get;set}
 }

Right now I'm using TypeModel.SerializeWithLengthPrefix(...) to serialize each message (TypeModel is compiled). Which works great, but it's about 10x slower than serializing each message manually using a BinaryWriter. What matters here of course is not the meta data but the serialization of each DataItem. I have a lot of data and in some cases it's read/written to a file and there performance is crucial.

What would be a good way of increasing the performance of the serialization and deserialization of each DataItem?

Should I use ProtoWriter directly here? If yes how would I do this (i'm a bit new to Protocol Buffers).

Upvotes: 3

Views: 1153

Answers (1)

Marc Gravell
Marc Gravell

Reputation: 1062865

Yes, if your data is a very simple set of homogeneous records, with no additional requirements (for example, it doesn't need to be forwards compatible or version elegantly, or be usable from clients that don't fully know all the data), doesn't need to be conveniently portable, and you don't mind implementing all the serialization manually, then yes: you can do it more efficiently manually. In a quick test:

protobuf-net serialize: 55ms, 3581680 bytes
protobuf-net deserialize: 65ms, 100000 items
BinaryFormatter serialize: 443ms, 4200629 bytes
BinaryFormatter deserialize: 745ms, 100000 items
manual serialize: 26ms, 2800004 bytes
manual deserialize: 32ms, 100000 items

The extra space is presumably the field markers (which you don't need if you are packing the records manually and don't need to worry about different versions of the API in use at the same time).

I certainly don't reproduce "10x"; I get 2x, which isn't bad considering the things that protobuf offers. And is certainly a lot better than BinaryFormatter, which is more like 20x! Here's some of the features:

  • version tolerance
  • portability
  • schema usage
  • no manual code
  • inbuilt support for sub-objects and collections
  • support for omitting default values
  • support for common .NET scenarios (serialization callbacks; conditional serialization patterns, etc)
  • inheritance (protobuf-net only; not part of the standard protobuf spec)

It sounds like in your scenario manual serialization is the thing to do; that's fine - I'm not offended ;p the purpose of a serialization library is to address the more general problem in a way that doesn't need manual code writing.

My test rig:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using ProtoBuf;
using ProtoBuf.Meta;
using System.Runtime.Serialization.Formatters.Binary;

public static class Program
{
    static void Main()
    {

        var model = RuntimeTypeModel.Create();
        model.Add(typeof(BarWrapper), true);
        model.Add(typeof(Bar), true);
        model.CompileInPlace();

        var data = CreateBar(100000).ToList();
        RunTest(model, data);

    }

    private static void RunTest(RuntimeTypeModel model, List<Bar> data)
    {
        using(var ms = new MemoryStream())
        {
            var watch = Stopwatch.StartNew();
            model.Serialize(ms, new BarWrapper {Bars = data});
            watch.Stop();
            Console.WriteLine("protobuf-net serialize: {0}ms, {1} bytes", watch.ElapsedMilliseconds, ms.Length);

            ms.Position = 0;
            watch = Stopwatch.StartNew();
            var bars = ((BarWrapper) model.Deserialize(ms, null, typeof (BarWrapper))).Bars;
            watch.Stop();
            Console.WriteLine("protobuf-net deserialize: {0}ms, {1} items", watch.ElapsedMilliseconds, bars.Count);
        }
        using (var ms = new MemoryStream())
        {
            var bf = new BinaryFormatter();
            var watch = Stopwatch.StartNew();
            bf.Serialize(ms, new BarWrapper { Bars = data });
            watch.Stop();
            Console.WriteLine("BinaryFormatter serialize: {0}ms, {1} bytes", watch.ElapsedMilliseconds, ms.Length);

            ms.Position = 0;
            watch = Stopwatch.StartNew();
            var bars = ((BarWrapper)bf.Deserialize(ms)).Bars;
            watch.Stop();
            Console.WriteLine("BinaryFormatter deserialize: {0}ms, {1} items", watch.ElapsedMilliseconds, bars.Count);
        }
        byte[] raw;
        using (var ms = new MemoryStream())
        {
            var watch = Stopwatch.StartNew();
            WriteBars(ms, data);
            watch.Stop();
            raw = ms.ToArray();
            Console.WriteLine("manual serialize: {0}ms, {1} bytes", watch.ElapsedMilliseconds, raw.Length);
        }
        using(var ms = new MemoryStream(raw))
        {
            var watch = Stopwatch.StartNew();
            var bars = ReadBars(ms);
            watch.Stop();
            Console.WriteLine("manual deserialize: {0}ms, {1} items", watch.ElapsedMilliseconds, bars.Count);            
        }

    }
    static IList<Bar> ReadBars(Stream stream)
    {
        using(var reader = new BinaryReader(stream))
        {
            int count = reader.ReadInt32();
            var bars = new List<Bar>(count);
            while(count-- > 0)
            {
                var bar = new Bar();
                bar.DateTime = DateTime.FromBinary(reader.ReadInt64());
                bar.Open = reader.ReadInt32();
                bar.High = reader.ReadInt32();
                bar.Low = reader.ReadInt32();
                bar.Close = reader.ReadInt32();
                bar.Volume = reader.ReadInt32();
                bars.Add(bar);
            }
            return bars;
        }
    }
    static void WriteBars(Stream stream, IList<Bar> bars )
    {
        using(var writer = new BinaryWriter(stream))
        {
            writer.Write(bars.Count);
            foreach (var bar in bars)
            {
                writer.Write(bar.DateTime.ToBinary());
                writer.Write(bar.Open);
                writer.Write(bar.High);
                writer.Write(bar.Low);
                writer.Write(bar.Close);
                writer.Write(bar.Volume);
            }
        }

    }
    static IEnumerable<Bar> CreateBar(int count)
    {
        var rand = new Random(12345);
        while(count-- > 0)
        {
            var bar = new Bar();
            bar.DateTime = new DateTime(
                rand.Next(2008,2011), rand.Next(1,13), rand.Next(1, 29),
                rand.Next(0,24), rand.Next(0,60), rand.Next(0,60));
            bar.Open = (float) rand.NextDouble();
            bar.High = (float)rand.NextDouble();
            bar.Low = (float)rand.NextDouble();
            bar.Close = (float)rand.NextDouble();
            bar.Volume = rand.Next(-50000, 50000);
            yield return bar;
        }
    }

}
[ProtoContract]
[Serializable] // just for BinaryFormatter test
public class BarWrapper
{
    [ProtoMember(1, DataFormat = DataFormat.Group)]
    public List<Bar> Bars { get; set; } 
}
[ProtoContract]
[Serializable] // just for BinaryFormatter test
public class Bar
{
    [ProtoMember(1)]
    public DateTime DateTime { get; set; }

    [ProtoMember(2)]
    public float Open { get; set; }

    [ProtoMember(3)]
    public float High { get; set; }

    [ProtoMember(4)]
    public float Low { get; set; }

    [ProtoMember(5)]
    public float Close { get; set; }

    // use zigzag if it can be -ve/+ve, or default if non-negative only
    [ProtoMember(6, DataFormat = DataFormat.ZigZag)]
    public int Volume { get; set; }
}

Upvotes: 3

Related Questions