shashwat
shashwat

Reputation: 8014

Using protobuf-net to write fixed size objects in parts and read them one-by-one

Use Case Description

I receive the collections in chunks from a server and I want them to write to file in a way so I can read them back one-by-one later. My objects are fixed size meaning the class only contains objects of types double, long and DateTime.

I already serialize and deserialize objects using below methods at different places in my project:

public static T Deserialize<T>(byte[] buffer)
{
  using (MemoryStream stream = new MemoryStream(buffer))
  {
    return Serializer.Deserialize<T>(stream);
  }
}

public static byte[] Serialize<T>(T message)
{
  using (MemoryStream stream = new MemoryStream())
  {
    Serializer.Serialize(stream, message);
    return stream.ToArray();
  }
}

But, even if this could work, I still think it will produce a larger output file because I believe protobuf stores some information about field names (in its own way). But I could create the byte[] using BinaryWriter without having any info of field names. I know I need to make sure that I read them back in the right order but this could still make some meaningful impact on the output size file I think especially when the number of objects in the collection is really huge.

Do you think is there a way to efficiently write collections in parts and be able to read them one-by-one and also having minimum output files and memory footprint while reading as my collections are really large containing years of market data that I need to read and process. I need to just read the object once, process it, and forget about it. I do not have any need to keep objects in memory.

Upvotes: 1

Views: 547

Answers (1)

Marc Gravell
Marc Gravell

Reputation: 1063884

Protobuf doesn't store field names, but it does use a field prefix that is an encoded integer. For storing multiple objects, you would typically use the *WithLengthPrefix overloads; in particular, DateTime has no reliable fixed length encoding.

However! In your case, perhaps a serializer isn't the right tool. I would consider:

  • creating a readonly struct composed of a double and two long (or three long if you need high precision epoch time)
  • using a memory mapped file to access the file system directly
  • create a Span<byte> over the memory mapped file (or a section thereof)
  • coerce the Span<byte> to a Span<YourStruct> using MemoryMarshal.Cast

et voila, direct access to your values all the way to the file system.

Upvotes: 1

Related Questions