Varun Tulsian
Varun Tulsian

Reputation: 113

design pattern for streaming protoBuf messages

I want to stream protobuf messages onto a file.

I have a protobuf message

message car {
     ... // some fields
}

My java code would create multiple objects of this car message.

How should I stream these messages onto a file.

As far as I know there are 2 ways of going about it.

  1. Have another message like cars

    message cars {
      repeated car c = 1;
    }
    

    and make the java code create a single cars type object and then stream it to a file.

  2. Just stream the car messages onto a single file appropriately using the writeDelimitedTo function.

I am wondering which is the more efficient way to go about streaming using protobuf.

When should I use pattern 1 and when should I be using pattern 2?

This is what I got from https://developers.google.com/protocol-buffers/docs/techniques#large-data

I am not clear on what they are trying to say.

Large Data Sets

Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.

That said, Protocol Buffers are great for handling individual messages within a large data set. Usually, large data sets are really just a collection of small pieces, where each small piece may be a structured piece of data. Even though Protocol Buffers cannot handle the entire set at once, using Protocol Buffers to encode each piece greatly simplifies your problem: now all you need is to handle a set of byte strings rather than a set of structures.

Protocol Buffers do not include any built-in support for large data sets because different situations call for different solutions. Sometimes a simple list of records will do while other times you may want something more like a database. Each solution should be developed as a separate library, so that only those who need it need to pay the costs.

Upvotes: 5

Views: 8421

Answers (1)

Bruce Martin
Bruce Martin

Reputation: 10543

Have a look at Previous Question. Any difference in size and time will be minimal (option 1 faster ??, option 2 smaller).

My advice would be:

  1. Option 2 for big files. You process message by message.
  2. Option 1 if multiple languages are need. In the past, delimited was not supported in all languages, this seems to be changing though.
  3. Other wise personel preferrence.

Upvotes: 1

Related Questions