tozevv
tozevv

Reputation: 784

Partial and asynchronous deserialization c# with protobuf-net

Context

I'm have a file with the following structure:

[ProtoContract]
public class Data
{
    [ProtoMember(1)]
    public string Header { get; set; }

    [ProtoMember(2)]
    public byte[] Body { get; set; }
}

The code that reads / writes the data to a file is running on a asp.net mvc webapi context. I'm trying to keep every blocking IO async to minimize blocking and achieve the best scalability. Reading and writing from files does support ReadAsync, WriteAsync and CopyToAsync.

The body can be reasonably large (>> header) and I only need to read the body if the header matches some specific criteria.

I can partially read and deserialize the header synchronously and read and deserialize the body the same way by using the approach explained in Deserialize part of a binary file

Problem

How can I use asynchronous file IO to do exactly the same, reading and deserializing the header Async and reading and deserializing the body the same way?

I've read Asynchronous protobuf serialization is not an option.

Upvotes: 1

Views: 1625

Answers (1)

Marc Gravell
Marc Gravell

Reputation: 1062945

Technically protobuf fields can be out-of-order, but in most cases (including the one you show) we can reasonable assume the fields are in-order (the only way to get them out-of-order here would be to separately serialize two half-classes and concatenate the results, which is technically valid in the protobuf specification).

So; what we will have is:

  • a varint denoting: field 1, string - always decimal 10
  • a varint denoting "a", the length of the header
  • "a" bytes, the UTF-8 encoded header
  • a varint denoting: field 2, string - always decimal 18
  • a varint denoting "b", the length of the body
  • "b" bytes, the body

We can probably assume that "a" is >= 0 and < int.MaxValue - which means it will take at most 5 bytes to encode; so, if you buffer at least 6 bytes, you will have enough information to know how large the header is. Of course, it could technically also contain part of the body, so you'd need to keep hold of it! But if you had a sync-over-async Stream, you can read just that part of the stream by something like:

int protoHeader = ProtoReader.DirectReadVarintInt32(stream); // 10
int headerLength = ProtoReader.DirectReadVarintInt32(stream);
string header = ProtoReader.DirectReadString(stream, headerLength);

Or if the "sync over async" is tricky, explicit reading:

static byte[] ReadAtLeast6()
{
    return new byte[] { 0x0A, 0x0B, 0x68, 0x65, 0x6C, 0x6C, 0x6F };
}
static byte[] ReadMore(int bytes)
{
    return new byte[] { 0x20, 0x77, 0x6F, 0x72, 0x6C, 0x64 };
}
static void Main()
{
    // pretend we read 7 bytes async
    var data = ReadAtLeast6();
    using (var ms = new MemoryStream())
    {
        ms.Write(data, 0, data.Length);
        ms.Position = 0;
        int protoHeader = ProtoReader.DirectReadVarintInt32(ms); // 10
        int headerLength = ProtoReader.DirectReadVarintInt32(ms); // 11

        int needed = (headerLength + (int)ms.Position) - data.Length; // 6 more
        var pos = ms.Position;
        ms.Seek(0, SeekOrigin.End);
        data = ReadMore(needed);
        ms.Write(data, 0, needed);
        ms.Position = pos;
        string header = ProtoReader.DirectReadString(ms, headerLength);
    }
}

Upvotes: 2

Related Questions