Robert Massa
Robert Massa

Reputation: 4608

How to read back appended objects using protobuf-net?

I'm appending real-time events to a file stream using protobuf-net serialization. How can I stream all saved objects back for analysis? I don't want to use an in-memory collection (because it would be huge).

private IEnumerable<Activity> Read() {
  using (var iso = new IsolatedStorageFileStream(storageFilename, FileMode.OpenOrCreate, FileAccess.Read, this.storage))
  using (var sr = new StreamReader(iso)) {
    while (!sr.EndOfStream) {
      yield return Serializer.Deserialize<Activity>(iso); // doesn't work
    }
  }
}

public void Append(Activity activity) {
  using (var iso = new IsolatedStorageFileStream(storageFilename, FileMode.Append, FileAccess.Write, this.storage)) {
    Serializer.Serialize(iso, activity);
  }
}

Upvotes: 3

Views: 628

Answers (1)

Marc Gravell
Marc Gravell

Reputation: 1064114

First, I need to discuss the protobuf format (via Google, not specific to protobuf-net). By design, it is appendable but with append===merge. For lists this means "append as new items", but for single objects this means "combine the members". Secondly, as a consequence of the above, the root object in protobuf is never terminated - the "end" is simply: when you run out of incoming data. Thirdly, and again as a direct consequence - fields are not required to be in any specific order, and generally will overwrite. So: if you just use Serialize lots of times, and then read the data back: you will have exactly one object, which will have basically the values from the last object on the stream.

What you want to do, though, is a very common scenario. So protobuf-net helps you out by including the SerializeWithLengthPrefix and DeserializeWithLengthPrefix methods. If you use these instead of Serialize / Deserialize, then it is possible to correctly parse individual objects. Basically, the length-prefix restricts the data so that only the exact amount per-object is read (rather than reading to the end of the file).

I strongly suggest (as parameters) using tag===field-number===1, and the base-128 prefix-style (an enum). As well as making the data fully protobuf compliant throughout (including the prefix data), this will make it easy to use an extra helper method: DeserializeItems. This exposes each consecutive object via an iterator-block, making it efficient to read huge files without needing everything in memory at once. It even works with LINQ.

There is also a way to use the API to selectively parse/skip different objects in the file - for example, to skip the first 532 records without processing the data. Let me know if you need an example of that.

If you already have lots of data that was already stored with Serialize rather than SerializeWithLengthPrefix - then it is probably still possible to decipher the data, by using ProtoReader to detect when the field-numbers loop back around : meaning, given fields "1, 2, 4, 5, 1, 3, 2, 5" - we can probably conclude there are 3 objects there and decipher accordingly. Again, let me know if you need a specific example.

Upvotes: 3

Related Questions