Maxim Akhmedov
Maxim Akhmedov

Reputation: 699

Pre-serializing some fields of a proto message

Suppose I have a proto structure that looks like the following:

message TMessage {
    optional TDictionary dictionary = 1;
    optional int specificField1 = 2;
    optional TOtherMessage specificField2 = 3;
    ...
}

Suppose I am using C++. This is the message stub that is used in the master process to send information to the bunch of the nodes using the network. In particular, the dictionary field is 1) pretty heavy 2) common for all the serialized messages, and all the following specific fields are filled with the relatively small information specific to the destination node.

Of course, dictionary is built only once, but it comes out that the major part of running time is spent while serializing the common dictionary part again and again for each new node.

Obvious optimization would be to pre-serialize dictionary into the byte string and put it into the TMessage as a bytes field, but this looks a bit nasty to me.

Am I right that there is no built-in way to pre-serialize a message field without ruining the message structure? It sounds like an idea for a good plugin for proto compiler.

Upvotes: 2

Views: 346

Answers (2)

jpa
jpa

Reputation: 12176

Marc's answer is perfect for your use case. Here is just another option:

  1. The field must be a submessage, like your TDictionary is.
  2. Have another variant of the outer message, with bytes in place of the submessage you want to preserialize:
    message TMessage_preserialized {
        optional bytes dictionary = 1;
        ...
    }
  1. Now you can serialize the TDictionary separately and put the resulting data in the bytes field. In protobuf format, submessages and bytes field are written out the same way. This means you can serialize as TMessage_preserialized and still deserialize as normal TMessage.

Upvotes: 2

Marc Gravell
Marc Gravell

Reputation: 1062905

Protobuf is designed such that concatenation === composition, at least for the root message. That means that you can serialize an object with just the dictionary, and snapshot the bytes somewhere. Now for each of the real messages you can paste down that snapshot, and then serialize an object with just the other fields - just whack it straight after: no additional syntax is required. This is semantically identical to serializing them all at the same time. In fact, since it will retain the field order, it should actually be identical bytes too.

It helps that you used "optional" throughout :)

Upvotes: 2

Related Questions