Using protobuf-net to write fixed size objects in parts and read them one-by-one

Question

Use Case Description

I receive the collections in chunks from a server and I want them to write to file in a way so I can read them back one-by-one later. My objects are fixed size meaning the class only contains objects of types double, long and DateTime.

I already serialize and deserialize objects using below methods at different places in my project:

public static T Deserialize(byte[] buffer)
{
  using (MemoryStream stream = new MemoryStream(buffer))
  {
    return Serializer.Deserialize(stream);
  }
}

public static byte[] Serialize(T message)
{
  using (MemoryStream stream = new MemoryStream())
  {
    Serializer.Serialize(stream, message);
    return stream.ToArray();
  }
}

But, even if this could work, I still think it will produce a larger output file because I believe protobuf stores some information about field names (in its own way). But I could create the byte[] using BinaryWriter without having any info of field names. I know I need to make sure that I read them back in the right order but this could still make some meaningful impact on the output size file I think especially when the number of objects in the collection is really huge.

Do you think is there a way to efficiently write collections in parts and be able to read them one-by-one and also having minimum output files and memory footprint while reading as my collections are really large containing years of market data that I need to read and process. I need to just read the object once, process it, and forget about it. I do not have any need to keep objects in memory.

Marc Gravell · Accepted Answer

Protobuf doesn't store field names, but it does use a field prefix that is an encoded integer. For storing multiple objects, you would typically use the *WithLengthPrefix overloads; in particular, DateTime has no reliable fixed length encoding.

However! In your case, perhaps a serializer isn't the right tool. I would consider:

creating a readonly struct composed of a double and two long (or three long if you need high precision epoch time)
using a memory mapped file to access the file system directly
create a Span over the memory mapped file (or a section thereof)
coerce the Span to a Span using MemoryMarshal.Cast

et voila, direct access to your values all the way to the file system.

Using protobuf-net to write fixed size objects in parts and read them one-by-one

Use Case Description

Answers (1)

Related Questions