Differences in benchmarking two packages?

Question

I'm a student(noob) tasked with benchmarking googles protocol buffers and apache thrift serialization packages.

My issue is that in Apache Thrift THREE calls are made to serialize to string... but in google protocol buffers only ONE call is made.

The three apache thrift calls are to set up memory before serializing.

Should I include those memory setup functions in my benchmark for apache thrift to be equivalent to the google call?

Are there any guide posts or best practices for benchmarking something like this?

#apache thrift
person1 = Person()
person1.name = "person1"
person1.id = 1
person1.email = "test@test.com"
#three calls
transportOut = TTransport.TMemoryBuffer()
protocolOut = TBinaryProtocol.TBinaryProtocol(transportOut)
person1.write(protocolOut)



#google protocol
person1 = Person()
person1.name="person1"
person1.id=1
person1.email="test@test.com"
#one call
person1.SerializeToString()

Thanks in advance!

Kenton Varda · Accepted Answer

The exact use of the API can indeed make a huge difference in benchmark performance, although whether it is one calls or three is not necessarily the central issue. For example, in Protobuf-C++, you can use SerializeToString() to get an std::string, but if you are ultimately writing that string to a file, it may be faster to use SerializeToFileDescriptor(). You need to be careful to use the best API for the job in order to create a fair benchmark.

In Python (which it looks like you are using), there is no other way to serialize than to a string. However, there is SerializePartialToString() which skips checking whether required are present. Using this may have a performance impact, since it does less work. Whether or not dropping that work is "fair" is highly debatable -- many apps actually do not want required field checking, but others do. This is where benchmarks get really murky.

I imagine Thrift has similar issues, though I'm not very familiar with its API.

Ultimately you need to carefully study the APIs that are available, decide what specific use case you want to target, and then choose what you think is most appropriate for that case.

To answer your specific question, though, I think you should include all setup that is specific to the particular message instance. An apples-to-apples comparison should include all per-message-instance setup and teardown.

BTW, be sure to try enabling the C-extension-backed Protobuf-Python implementation (bottom of page). It's a whole lot faster. Of course, this brings up another use case question: are you imagining a case where C extensions are allowed? For example, they are not allowed on AppEngine. (BTW, I'm not sure if Thrift uses C extensions or not; you should check on that.)

It might be interesting to throw in a comparison with Cap'n Proto while you're at it. (Disclosure: I am the author of Cap'n Proto in C++, and also (in the past) of Protobufs v2 in C++ and Java, but not of the Python versions in either case.)

Differences in benchmarking two packages?

Answers (2)

Related Questions