Daniel Peñalba
Daniel Peñalba

Reputation: 31857

Does protocol buffers support lightweight reference serialization

Suppose a class like this:

class Person {
  public string Name;
  public Person Parent;
}

Now you create two objects:

...
Person mike = new Person("Mike");
Person jack = new Person("Jack");
jack.Parent = mike;
List<Person> family = new List<Person>();
people.Add(mike);
people.Add(jack);
...

Will the string "Mike" serialized once (maintaining) a unique reference to object mike and then resolving it, or will be serialized twice?

Upvotes: 1

Views: 1045

Answers (1)

Marc Gravell
Marc Gravell

Reputation: 1062955

The answer here is "it depends". The protobuf specification does not include any object identifiers / re-use, so normally (and by default) this would be a tree serialization, and the data would be duplicated.

We can examine this by using protobuf-net with all the default behaviours:

using ProtoBuf;
using System;
using System.Collections.Generic;
using System.Linq;

class Program
{
    static void Main()
    {
        Person mike = new Person { Name = "Mike" };
        Person jack = new Person { Name = "Jack" };
        jack.Parent = mike;
        List<Person> people = new List<Person>();
        people.Add(mike);
        people.Add(jack);

        var cloneOfEverything = Serializer.DeepClone(people);

        var newMike = cloneOfEverything.Single(x => x.Name == "Mike");
        var newJack = cloneOfEverything.Single(x => x.Name == "Jack");
        Console.WriteLine(jack.Parent.Name); // writes Miks as expected

        bool areSamePersonObject = ReferenceEquals(newMike, newJack.Parent);
        // False ^^^

        bool areSameStringInstance = ReferenceEquals(
            newMike.Name, newJack.Parent.Name);
        // True ^^^
    }
}
[ProtoContract]
class Person
{
    [ProtoMember(1)]
    public string Name;
    [ProtoMember(2)]
    public Person Parent;
}

Observations:

  • Jack's parent is correctly called Mike
  • but it is a different object instance that just looks identical
  • the string is the same instance - as an implementation detail, protobuf-net includes code that spots the same UTF-8 chunk in the data, and re-uses the same string instance to avoid allocations - but the data was included twice in the binary
  • for info, the above tree takes 24 bytes

We can also see this by investigating what happens here:

Person mike = new Person { Name = "Mike" };
mike.Parent = mike;
var clone = Serializer.DeepClone(mike);

Because it is writing as a tree, it errors with:

Possible recursion detected (offset: 1 level(s)): Person

However! As libary-specific implementation details, protobuf-net includes lots of knobs and dials that you can turn. One of these relates to object identity. We can toggle Person to act with reference identity:

[ProtoContract(AsReferenceDefault=true)]
class Person {...}

This changes the data in the binary (to include additional markers), and as a consequence - now the same lines work:

Person mike = new Person { Name = "Mike" };
mike.Parent = mike;
var clone = Serializer.DeepClone(mike);
bool areSamePersonObject = ReferenceEquals(clone, clone.Parent);
// ^^^ true

Note that this uses implementation-specific details, and may confuse other implementations.

AsReferenceDefault here states that Person should be treated as a reference whenever it is seen; for more granular control, [ProtoMember] also includes an AsReference object which can be used individually. A quick check, however, seems to indicate that it is not currently working properly with List<Person> - I will need to investigate that. There may be a good reason, but I can't think of one currently, and I suspect that is a bug.

AsReference can also be included on string members to avoid writing the same string repeatedly - although note that in this case it is probably cheaper to write "Mike" twice! This option would be useful when the same string is repeated lots of times. Despite the name, when working with string, AsReference is interpreted as "string equality", not "reference equality".

Upvotes: 3

Related Questions