Reputation: 1150
We are consuming a large (GBs) network stream serialised as JSON over http, using Newtonsoft.Json nuget package deserialising the response stream into in-memory records for further manipulation.
Given the excessive data volumes, we are using streaming to receive a chunk of response at a time and would like to optimise this process as we are hitting CPU limits.
One of the candidates for optimisations seems to be the JsonTextReader, which is constantly allocating new objects and hence triggering Garbage Collection.
We have followed advice from Newtonsoft Performance Tips.
I've created a sample .net console app simulating the behaviour allocating new objects as the JsonTextReader is reading through the response stream, allocating Strings representing property names and values
Question: Is there anything else we can tweak/override to reuse already allocated property names/values instances, given in real world 95% of them are repeated (in test it's the same record so 100% repetition)?
Sample app:
Install-Package Newtonsoft.Json -Version 12.0.2
Install-Package System.Buffers -Version 4.5.0
Program.cs
using System;
using System.Buffers;
using System.IO;
using System.Linq;
using System.Text;
using Newtonsoft.Json;
namespace JsonNetTester
{
class Program
{
static void Main(string[] args)
{
using (var sr = new MockedStreamReader())
using (var jtr = new JsonTextReader(sr))
{
// does not seem to make any difference
//jtr.ArrayPool = JsonArrayPool.Instance;
// every read is allocating new objects
while (jtr.Read())
{
}
}
}
// simulating continuous stream of records serialised as json
public class MockedStreamReader : StreamReader
{
private bool initialProvided = false;
private byte[] initialBytes = Encoding.Default.GetBytes("[");
private static readonly byte[] recordBytes;
int nextStart = 0;
static MockedStreamReader()
{
var recordSb = new StringBuilder("{");
// generate [i] of { "Key[i]": "Value[i]" },
Enumerable.Range(0, 50).ToList().ForEach(i =>
{
if (i > 0)
{
recordSb.Append(",");
}
recordSb.Append($"\"Key{i}\": \"Value{i}\"");
});
recordSb.Append("},");
recordBytes = Encoding.Default.GetBytes(recordSb.ToString());
}
public MockedStreamReader() : base(new MemoryStream())
{ }
public override int Read(char[] buffer, int index, int count)
{
// keep on reading the same record in loop
if (this.initialProvided)
{
var start = nextStart;
var length = Math.Min(recordBytes.Length - start, count);
var end = start + length;
nextStart = end >= recordBytes.Length ? 0 : end;
Array.Copy(recordBytes, start, buffer, index, length);
return length;
}
else
{
initialProvided = true;
Array.Copy(initialBytes, buffer, initialBytes.Length);
return initialBytes.Length;
}
}
}
// attempt to reuse data in serialisation
public class JsonArrayPool : IArrayPool<char>
{
public static readonly JsonArrayPool Instance = new JsonArrayPool();
public char[] Rent(int minimumLength)
{
return ArrayPool<char>.Shared.Rent(minimumLength);
}
public void Return(char[] array)
{
ArrayPool<char>.Shared.Return(array);
}
}
}
}
Allocations can be observed via Visual Studio Debug > Performance Profiler > .NET Object Allocation Tracking, or Performance Monitor #Gen 0/1 Collections
Upvotes: 3
Views: 2514
Reputation: 117105
Answering in parts:
Setting JsonTextReader.ArrayPool
as you are doing already (which is also shown in DemoTests.ArrayPooling()
) should help minimize memory pressure due to allocation of intermediate character arrays during parsing. It will not, however, reduce memory use due to allocation of strings, which seems to be your complaint.
As of Release 12.0.1, Json.NET has the ability to reuse instances of property name strings by setting JsonTextReader.PropertyNameTable
to some appropriate JsonNameTable
subclass.
This mechanism is used during deserialization, by JsonSerializer.SetupReader()
, to set a name table on the reader that returns the property names stored by the contract resolver, thus preventing repeated allocation of known property names expected by the serializer.
You, however, are not using a serializer, you are reading directly, and so are not taking advantage of this mechanism. To enable it, you could create your own custom JsonNameTable
to cache the property names you actually encounter:
public class AutomaticJsonNameTable : DefaultJsonNameTable
{
int nAutoAdded = 0;
int maxToAutoAdd;
public AutomaticJsonNameTable(int maxToAdd)
{
this.maxToAutoAdd = maxToAdd;
}
public override string Get(char[] key, int start, int length)
{
var s = base.Get(key, start, length);
if (s == null && nAutoAdded < maxToAutoAdd)
{
s = new string(key, start, length);
Add(s);
nAutoAdded++;
}
return s;
}
}
And then use it as follows:
const int MaxPropertyNamesToCache = 200; // Set through experiment.
var nameTable = new AutomaticJsonNameTable(MaxPropertyNamesToCache);
using (var sr = new MockedStreamReader())
using (var jtr = new JsonTextReader(sr) { PropertyNameTable = nameTable })
{
// Process as before.
}
This should substantially reduce memory pressure due to property names.
Note that AutomaticJsonNameTable
will only auto-cache a specified, finite number of names to prevent memory allocation attacks. You'll need to determine this maximum number though experimentation. You could also manually hardcode the addition of expected, known property names.
Note also that, by manually specifying a name table, you prevent use of the serializer-specified name table during deserialization. If your parsing algorithm involves reading through the file to locate specific nested objects, then deserializing those objects, you might get better performance by temporarily nulling out the name table before deserialization, e.g. with the following extension method:
public static class JsonSerializerExtensions
{
public static T DeserializeWithDefaultNameTable<T>(this JsonSerializer serializer, JsonReader reader)
{
JsonNameTable old = null;
var textReader = reader as JsonTextReader;
if (textReader != null)
{
old = textReader.PropertyNameTable;
textReader.PropertyNameTable = null;
}
try
{
return serializer.Deserialize<T>(reader);
}
finally
{
if (textReader != null)
textReader.PropertyNameTable = old;
}
}
}
It would need to be determined by experimentation whether using the serializer's name table gives better performance than your own (and I have not done any such experiment as part of writing this answer).
There is currently no way to prevent JsonTextReader
from allocating strings for property values even when skipping or otherwise ignoring those values. See please should support real skipping (no materialization of properties/etc) #1021 for a similar enhancement request.
Your only option here would appear to be to fork your own version of JsonTextReader
and add this capability yourself. You'd need to find all calls to SetToken(JsonToken.String, _stringReference.ToString(), ...)
and replace the call to __stringReference.ToString()
with something that doesn't unconditionally allocate memory.
For instance, if you have a large chunk of JSON you would like to skip though, you could add a string DummyValue
to JsonTextReader
:
public partial class MyJsonTextReader : JsonReader, IJsonLineInfo
{
public string DummyValue { get; set; }
And then add the following logic where required (in two places currently):
string text = DummyValue ?? _stringReference.ToString();
SetToken(JsonToken.String, text, false);
Or
SetToken(JsonToken.String, DummyValue ?? _stringReference.ToString(), false);
Then, when reading value(s) you know can be skipped, you would set MyJsonTextReader.DummyValue
to some stub, say "dummy value"
.
Alternatively, if you have many non-skippable repeated property values that you can predict in advance, you could create a second JsonNameTable StringValueNameTable
and, when non-null, try looking up the StringReference
in it like so:
var text = StringValueNameTable?.Get(_stringReference.Chars, _stringReference.StartIndex, _stringReference.Length) ?? _stringReference.ToString();
Unfortunately, forking your own JsonTextReader
may require substantial ongoing maintenance, since you will also need to fork any and all Newtonsoft utilities used by the reader (there are many) and update them to any breaking changes in the original library.
You could also vote up or comment on enhancement request #1021 requesting this ability, or add a similar request yourself.
Upvotes: 5