Reputation:
I have been recently tasked with exploring the use of protobuf-net within a performance critical application. It currently uses Newtonsoft.Json and for the most part protobuf-net has shown excellent performance gains. But in some cases memory allocations are going through the roof and I am stuck on how to figure out what is going on.
I've put together a small console application that replicates the problem (the problem was originally found by performance regression tests). I can't post that exact test for obvious reasons, but I have a similar example;
public class Program
{
public static void Main(string[] args)
{
AppDomain.MonitoringIsEnabled = true;
var useProtoBuf = args.Length > 0;
if (useProtoBuf)
{
Console.WriteLine("Using protobuf-net");
}
else
{
Console.WriteLine("Using json.net");
}
var runtimeTypeModel = TypeModel.Create();
runtimeTypeModel.Add(typeof(TestResult), true);
var list = new List<Wrapper>();
for (var index = 0; index < 1_000_000; index++)
{
list.Add(new Wrapper
{
Value = "C5CAD058-3A05-48EA-9626-A6B4F692B14E"
});
}
var result = new TestResult
{
First = new CollectionWrapper
{
Collection = list
}
};
for (var i = 0; i < 25; i++)
{
if (useProtoBuf)
{
using (var stream = File.Create(@"..\..\protobuf-net.bin"))
{
runtimeTypeModel.Serialize(stream, result);
}
}
else
{
using (var stream = File.CreateText(@"..\..\json.net.json"))
using (var writer = new JsonTextWriter(stream))
{
new JsonSerializer().Serialize(writer, result);
}
}
}
Console.WriteLine($"Took: {AppDomain.CurrentDomain.MonitoringTotalProcessorTime.TotalMilliseconds:#,###} ms");
Console.WriteLine($"Allocated: {AppDomain.CurrentDomain.MonitoringTotalAllocatedMemorySize / 1024:#,#} kb");
Console.WriteLine($"Peak Working Set: {Process.GetCurrentProcess().PeakWorkingSet64 / 1024:#,#} kb");
}
[ProtoContract]
public class Wrapper
{
[ProtoMember(1)]
public string Value { get; set; }
}
[ProtoContract]
public class TestResult
{
[ProtoMember(1)]
public CollectionWrapper First { get; set; }
}
[ProtoContract]
public class CollectionWrapper
{
[ProtoMember(1)]
public List<Wrapper> Collection { get; set; } = new List<Wrapper>();
}
}
I am using the following versions of the packages:-
<?xml version="1.0" encoding="utf-8"?>
<packages>
<package id="Newtonsoft.Json" version="10.0.3" targetFramework="net47" />
<package id="protobuf-net" version="2.3.4" targetFramework="net47" />
</packages>
Here are my results:-
Foo.exe
Using json.net
Took: 12,000 ms
Allocated: 20,436 kb
Peak Working Set: 36,332 kb
Foo.exe 1
Using protobuf-net
Took: 5,203 ms
Allocated: 3,296,838 kb
Peak Working Set: 137,044 kb
Any help would be appreciated.
Many thanks.
Upvotes: 2
Views: 1277
Reputation: 1063338
The is the result of length-prefix forcing buffering. This is something that will be completely reworked in the next "major" release (I have the prototype code, it just isn't ready yet), to avoid this issue completely - using some cunning tricks to efficiently calculate the required values in advance.
In the interim, there is an available way to prevent this buffering: use "groups". Basically, there are two ways of encoding sub-objects in protobuf - length-prefix (the default), or start/end sentinels. In comparison with JASON, you can think of these sentinels as the {
and }
, but in protobuf. To switch to this, add DataFormat = DataFormat.Group
to all the sub-object [ProtoMember(...)]
attributes, including on the collection members. This should radically cut the working set, but: it is a different data layout. Most protobuf libraries will work fine with groups, if x-plat is a concern, but to be clear: Google have decided that groups===bad (which is a shame, I love them!), and they no longer exist in the proto3 schema syntax - they are in proto2, though.
At the technical level:
Google obviously prefer cheap reads at the expense of more expensive writes. This impacts the v2 engine of protobuf-net more than it impacts Google's library, because of how they pre-encode most data. The v3 engine will be "cured" of this issue, but I have no hard ETA on that (I've been experimenting with the upcoming corefx "pipelines" API for the v3 engine, but that isn't going to happen anytime soon; however, I want the v3 API to be suitable for use with "pipelines", hence the work now; most likely v3 will ship a long time before "pipelines").
For now, please try:
[ProtoContract]
public class Wrapper
{
[ProtoMember(1)]
public string Value { get; set; }
}
[ProtoContract]
public class TestResult
{
[ProtoMember(1, DataFormat = DataFormat.Group)]
public CollectionWrapper First { get; set; }
}
[ProtoContract]
public class CollectionWrapper
{
[ProtoMember(1, DataFormat = DataFormat.Group)]
public List<Wrapper> Collection { get; set; } = new List<Wrapper>();
}
Upvotes: 1