Reputation: 1290
I am currently using json.net to deserialise a string that is mid size collection of objects. ~7000 items in total.
Each item has a recurring group of 4 identical strings, on memory profiling this creates about 40,000 references depending on nesting etc..
Is there a way to get the serializer to use the same reference for each identical string?
Example Json:
[{
"name":"jon bones",
"groups":[{
"groupName":"Region",
"code":"1"
},{
"groupName":"Class",
"code":"4"
}]
},
{
"name":"Swan moans",
"groups":[{
"groupName":"Region",
"code":"12"
},{
"groupName":"Class",
"code":"1"
}]
}]
Added example. as you can seen the groupName values repeat on almost all objects. just the relevant codes change. It's not such a great concern but as the dataset grows i would rather not increase allocations too much.
also it might seem like the "code" may repeat , but it is unique for each person. basically multiple identifiers for the same object.
Upvotes: 4
Views: 2606
Reputation: 8088
As an alternative to the serializers provided in the other answers (especially https://stackoverflow.com/a/39605620/6713), you can write your own short lived "interner". This means that you won't be filling up the CLR's string table, and once your Converter falls out of scope (after your deserialization has completed) then the only references left to your strings will be in the entities you've deserialized.
public class ReusableStringConverter : JsonConverter<string>
{
private readonly Dictionary<string, string> _items = new Dictionary<string, string>();
public override string ReadJson(JsonReader reader, Type objectType, string existingValue, bool hasExistingValue, JsonSerializer serializer)
{
if (reader.TokenType == JsonToken.Null)
return null;
var str = reader. Value as string;
if (str == null)
return null;
if (str.Length == 0)
return string.Empty;
if (_items.TryGetValue(str, out var item))
{
return item;
}
else
{
_items[str] = str;
return str;
}
}
public override bool CanWrite => false;
public override void WriteJson(JsonWriter writer, string value, JsonSerializer serializer) => throw new NotImplementedException();
}
If you're not targeting netstandard2.0 you can replace Dictionary with a HashTable (netstandard2.0 doesn't have TryGetValue)
Very rough benchmarks for us was that it reduced memory usage from 2.4gb to 1.4gb, and only increased processing time from 61 seconds to 63 seconds
Upvotes: 0
Reputation: 709
As pointed out in other answers, you need to be VERY careful with the use of String.Intern because of the lifetime of that allocation. For a small set of frequently used strings, this may be appropriate.
For our scenario, I chose to follow the pattern of the XML Serializers in .Net. They use a class call "System.Xml.NameTable" to resolve unique occurrences of strings within the XML document. I followed the implementation pattern provided by 'dbc' above, but used the NameTable instead of String.Intern
public class JsonNameTable
: System.Xml.NameTable
{
}
public class JsonNameTableConverter
: JsonConverter
{
private JsonNameTable _nameTable;
public JsonNameTableConverter(JsonNameTable nameTable)
{
_nameTable = nameTable;
}
public override bool CanConvert(Type objectType)
{
return objectType == typeof(string);
}
public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
{
if (reader.TokenType == JsonToken.Null)
return null;
var s = reader.TokenType == JsonToken.String ? (string)reader.Value : (string)Newtonsoft.Json.Linq.JToken.Load(reader); // Check is in case the value is a non-string literal such as an integer.
if (s != null)
{
s = _nameTable.Add(s);
}
return s;
}
public override bool CanWrite { get { return false; } }
public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
{
throw new NotImplementedException();
}
}
And then in the usage code, set a converter onto the Json Settings
JsonNameTable nameTable = new JsonNameTable();
settings.Converters.Add(new JsonNameTableConverter(nameTable));
This allows you to share strings, and control the lifetime of the strings with a reference to the JsonNameTable.
There is probably an improvement that can be made here: NameTable will actually return an existing string given a char[], start and end indexes. It may be possible to get the nameTable one level further down where strings are being read off the stream, thereby bypassing even any the creation of duplicate strings. However, I could not figure out how to do that in Json.Net
Upvotes: 1
Reputation: 117105
If you know your 4 standard strings in advance, you can intern them with String.Intern()
(or just declare them as string literals somewhere -- that does the job) then use the following custom JsonConverter
to convert all JSON string literals to their interned value if one is found:
public class InternedStringConverter : JsonConverter
{
public override bool CanConvert(Type objectType)
{
return objectType == typeof(string);
}
public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
{
if (reader.TokenType == JsonToken.Null)
return null;
var s = reader.TokenType == JsonToken.String ? (string)reader.Value : (string)JToken.Load(reader); // Check is in case the value is a non-string literal such as an integer.
return String.IsInterned(s) ?? s;
}
public override bool CanWrite { get { return false; } }
public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
{
throw new NotImplementedException();
}
}
This can be applied globally via serializer settings:
var settings = new JsonSerializerSettings { Converters = new [] { new InternedStringConverter() } };
var root = JsonConvert.DeserializeObject<RootObject>(jsonString, settings);
You can also apply it to the specific string collection using JsonPropertyAttribute.ItemConverterType
:
public class Group
{
[JsonProperty(ItemConverterType = typeof(InternedStringConverter))]
public List<string> StandardStrings { get; set; }
}
If you don't know the 4 strings in advance, you can create a converter that interns the strings as they are read:
public class AutoInterningStringConverter : JsonConverter
{
public override bool CanConvert(Type objectType)
{
// CanConvert is not called when a converter is applied directly to a property.
throw new NotImplementedException("AutoInterningStringConverter should not be used globally");
}
public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
{
if (reader.TokenType == JsonToken.Null)
return null;
var s = reader.TokenType == JsonToken.String ? (string)reader.Value : (string)JToken.Load(reader); // Check is in case the value is a non-string literal such as an integer.
return String.Intern(s);
}
public override bool CanWrite { get { return false; } }
public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
{
throw new NotImplementedException();
}
}
However, I strongly recommend against using this globally as you could end up adding enormous numbers of strings to the internal string table. Instead, only apply it to the specific string collection(s) that you are confident contain duplicates of small numbers of unique strings:
public class Group
{
[JsonProperty(ItemConverterType = typeof(AutoInterningStringConverter))]
public List<string> StandardStrings { get; set; }
}
Update
From your updated question, I see you have string properties with standard values, rather than a collection of strings with standard values. Thus you would use [JsonConverter(typeof(AutoInterningStringConverter))]
on each:
public class Group
{
[JsonConverter(typeof(AutoInterningStringConverter))]
public string groupName { get; set; }
public string code { get; set; }
}
Upvotes: 8