I have a message message Msg { uint32 a; uint32 b; bool c; } When I write a message using pb_encode I notice that the amount of stream.bytes_written depends on how many of the Msg fields were changed from their default. I really do not want to send a separate stream.bytes_written parameter in addition to a char * buffer . I'm thinking of doing something like this message Msg_ser { required int size; bytes Msg_ser_dat = 1 [(nanopb).max_size = 32]; } So pb_encode would write to Msg_ser.Msg_ser_dat and then message Msg_ser would itself be serialized. QUESTION Is there any wrong with this approach to store the size of the buffer in the buffer itself?

nanopb, google-protobuf - can I set the length of the message as part of the serialized data itself?

Reputation: 8434

Yes there is something wrong with that approach.

Unless something has changed fairly recently, there is no intention in GPB that the messages are self-demarcating. You have to have some separate means of marking the beginning / ending of a message, if the message is going to be stored or transmitted in amongst another (or lots of) GPB messages.

If one contrived to do as you suggest, and the wire format just so happened to allow the recipient to learn the size field before anything else, fine. But there's no guarantee that the wire format will always enable that.

Sending a separate bytes_written value is one way of doing it, i.e. the first 4 bytes sent are to be parsed as a native integer, indicating how many subsequent bytes there are in a GPB encoded message. OpenStreetMap, which makes heavy use of GPB, has a little protocol in its data files saying how long the next GPB message is and what sort of message it is, which allows a reader to easily skip ahead.

Another issue with the idea is that it assumes that every byte sent is received. This is definitely not the case with, for example, RS232 connections; the sender can be merrily sending out a byte stream, but if the receiver isn't connected, on, running and keeping up, those bytes are gone forever. So the receiver might be starting to get bytes part way through, and has no idea that the first bytes it receives are not in fact the size field of the message. In this circumstance it is best to have some sort of unique message start / end byte pattern, which the recipient can detect, discarding read bytes until it gets the pattern.

I know that these are a nuisance, but it's definitely for the best. You have to have a separate message demarcation protocol sitting underneath your GPB messaging layer (if one were looking at it like a layered protocol stack). You simply cannot sensibly shoehorn one protocol layer into another, especially when the technology (GPB) has no intention of supporting that.

Another way (if you have a network or other reliable stream connection) is to use a protocol like ZeroMQ, which looks after message demarcation for you.

Other serialisations are self demarcating. XML is (the tag opens / closes have to be consistent), JSON is (curly braces { }), some ASN.1 wire formats are too, but GPB is not.

Upvotes: 1

nanopb, google-protobuf - can I set the length of the message as part of the serialized data itself?

Answers (1)

Related Questions