Jack Edmonds
Jack Edmonds

Reputation: 33151

How to use Python and Google's Protocol Buffers to deserialize data sent over TCP

I'm trying to write an application which uses Google's protocol buffers to deserialize data (sent from another application using protocol buffers) over a TCP connection. The problem is that it looks as if protocol buffers in Python can only deserialize data from a string. Since TCP doesn't have well-defined message boundaries and one of the messages I'm trying to receive has a repeated field, I won't know how much data to try and receive before finally passing the string to be deserialized.

Are there any good practices for doing this in Python?

Upvotes: 18

Views: 14529

Answers (3)

davidA
davidA

Reputation: 13664

Another aspect to consider (albeit for a simpler case) is where you use a single TCP connection for a single message. In this case, as long as you know what the expected message is (or use Union Types to determine the message type at run-time), you can use the TCP connection open as the 'start' delimiter, and the connection close event as the final delimiter. This has the advantage that you'll receive the entire message quickly (whereas in other cases the TCP stream can be held for a time, delaying the receipt of your entire message). If you do this, you don't need any explicit in-band framing as the lifetime of the TCP connection acts as a frame itself.

Upvotes: 0

frymaster
frymaster

Reputation: 219

to expand on J.J.'s (entirely correct) answer, the protobuf library has no way to work out how long messages are on their own, or to work out what type of protobuf object is being sent*. So the other application that's sending you data must already be doing something like this.

When I had to do this, I implemented a lookup table:

messageLookup={0:foobar_pb2.MessageFoo,1:foobar_pb2.MessageBar,2:foobar_pb2.MessageBaz}

...and did essentially what J.J. did, but I also had a helper function:

    def parseMessage(self,msgType,stringMessage):
        msgClass=messageLookup[msgType]
        message=msgClass()
        message.ParseFromString(stringMessage)
        return message

...which I called to turn the string into a protobuf object.

(*) I think it's possible to get round this by encapsulating specific messages inside a container message

Upvotes: 4

J.J.
J.J.

Reputation: 5069

Don't just write the serialized data to the socket. First send a fixed-size field containing the length of the serialized object.

The sending side is roughly:

socket.write(struct.pack("H", len(data))    #send a two-byte size field
socket.write(data)

And the recv'ing side becomes something like:

dataToRead = struct.unpack("H", socket.read(2))[0]    
data = socket.read(dataToRead)

This is a common design pattern for socket programming. Most designs extend the over-the-wire structure to include a type field as well, so your receiving side becomes something like:

type = socket.read(1)                                 # get the type of msg
dataToRead = struct.unpack("H", socket.read(2))[0]    # get the len of the msg
data = socket.read(dataToRead)                        # read the msg

if TYPE_FOO == type:
    handleFoo(data)

elif TYPE_BAR == type:
    handleBar(data)

else:
    raise UnknownTypeException(type)

You end up with an over-the-wire message format that looks like:

struct {
     unsigned char type;
     unsigned short length;
     void *data;
}

This does a reasonable job of future-proofing the wire protocol against unforeseen requirements. It's a Type-Length-Value protocol, which you'll find again and again and again in network protocols.

Upvotes: 37

Related Questions