High memory allocations when using protobuf-net over json.net

Question

I have been recently tasked with exploring the use of protobuf-net within a performance critical application. It currently uses Newtonsoft.Json and for the most part protobuf-net has shown excellent performance gains. But in some cases memory allocations are going through the roof and I am stuck on how to figure out what is going on.

I've put together a small console application that replicates the problem (the problem was originally found by performance regression tests). I can't post that exact test for obvious reasons, but I have a similar example;

public class Program
{
    public static void Main(string[] args)
    {
        AppDomain.MonitoringIsEnabled = true;
        var useProtoBuf = args.Length > 0;

        if (useProtoBuf)
        {
            Console.WriteLine("Using protobuf-net");
        }
        else
        {
            Console.WriteLine("Using json.net");
        }

        var runtimeTypeModel = TypeModel.Create();
        runtimeTypeModel.Add(typeof(TestResult), true);
        var list = new List();

        for (var index = 0; index < 1_000_000; index++)
        {
            list.Add(new Wrapper
            {
                Value = "C5CAD058-3A05-48EA-9626-A6B4F692B14E"
            });
        }

        var result = new TestResult
        {
            First = new CollectionWrapper
            {
                Collection = list
            }
        };

        for (var i = 0; i < 25; i++)
        {
            if (useProtoBuf)
            {
                using (var stream = File.Create(@"..\..\protobuf-net.bin"))
                {
                    runtimeTypeModel.Serialize(stream, result);
                }
            }
            else
            {
                using (var stream = File.CreateText(@"..\..\json.net.json"))
                using (var writer = new JsonTextWriter(stream))
                {
                    new JsonSerializer().Serialize(writer, result);
                }
            }
        }

        Console.WriteLine($"Took: {AppDomain.CurrentDomain.MonitoringTotalProcessorTime.TotalMilliseconds:#,###} ms");
        Console.WriteLine($"Allocated: {AppDomain.CurrentDomain.MonitoringTotalAllocatedMemorySize / 1024:#,#} kb");
        Console.WriteLine($"Peak Working Set: {Process.GetCurrentProcess().PeakWorkingSet64 / 1024:#,#} kb");
    }

    [ProtoContract]
    public class Wrapper
    {
        [ProtoMember(1)]
        public string Value { get; set; }
    }

    [ProtoContract]
    public class TestResult
    {
        [ProtoMember(1)]
        public CollectionWrapper First { get; set; }
    }

    [ProtoContract]
    public class CollectionWrapper
    {
        [ProtoMember(1)]
        public List Collection { get; set; } = new List();
    }
}

I am using the following versions of the packages:-

Here are my results:-

Foo.exe
Using json.net
Took: 12,000 ms
Allocated: 20,436 kb
Peak Working Set: 36,332 kb

Foo.exe 1
Using protobuf-net
Took: 5,203 ms
Allocated: 3,296,838 kb
Peak Working Set: 137,044 kb

Any help would be appreciated.

Many thanks.

Marc Gravell · Accepted Answer

The is the result of length-prefix forcing buffering. This is something that will be completely reworked in the next "major" release (I have the prototype code, it just isn't ready yet), to avoid this issue completely - using some cunning tricks to efficiently calculate the required values in advance.

In the interim, there is an available way to prevent this buffering: use "groups". Basically, there are two ways of encoding sub-objects in protobuf - length-prefix (the default), or start/end sentinels. In comparison with JASON, you can think of these sentinels as the { and }, but in protobuf. To switch to this, add DataFormat = DataFormat.Group to all the sub-object [ProtoMember(...)] attributes, including on the collection members. This should radically cut the working set, but: it is a different data layout. Most protobuf libraries will work fine with groups, if x-plat is a concern, but to be clear: Google have decided that groups===bad (which is a shame, I love them!), and they no longer exist in the proto3 schema syntax - they are in proto2, though.

At the technical level:

length-prefix is more expensive to write (since it needs to be pre-calculated), but makes it very cheap to check you have an entire frame to decode
sentinels are ridiculously cheap to write, but make it more difficult to check you have an entire frame to decode (since you need to sanity-check on a per-field basis)

Google obviously prefer cheap reads at the expense of more expensive writes. This impacts the v2 engine of protobuf-net more than it impacts Google's library, because of how they pre-encode most data. The v3 engine will be "cured" of this issue, but I have no hard ETA on that (I've been experimenting with the upcoming corefx "pipelines" API for the v3 engine, but that isn't going to happen anytime soon; however, I want the v3 API to be suitable for use with "pipelines", hence the work now; most likely v3 will ship a long time before "pipelines").

For now, please try:

[ProtoContract]
public class Wrapper
{
    [ProtoMember(1)]
    public string Value { get; set; }
}

[ProtoContract]
public class TestResult
{
    [ProtoMember(1, DataFormat = DataFormat.Group)]
    public CollectionWrapper First { get; set; }
}

[ProtoContract]
public class CollectionWrapper
{
    [ProtoMember(1, DataFormat = DataFormat.Group)]
    public List Collection { get; set; } = new List();
}

High memory allocations when using protobuf-net over json.net

Answers (1)

Related Questions