What is the typical memory space usage for a Google protocol buffer?

Question

I'm working on a small device has a reasonably large set of configuration parameters (~100 KB) which are generated from PC software. In the past we've stored the parameters in a binary file and loaded them into a data structure. Maintenance is a bit annoying (different languages, making sure the order of fields in the structure matches, different versions, etc.), so we're considering going to Google protocol buffers.

From the small device's standpoint, I'm concerned about the memory space that will be required to store the serialized protocol buffer. I'm working in C, so I downloaded protobuf-embedded-c and started working on an example. I was a bit surprised by the maximum size of the buffer it was calculating. For example, what follows is the size of an empty buffer and then buffers containing a single variable of the named type:

#define MAX_M_Empty_SIZE 2
#define MAX_M_double_SIZE 12
#define MAX_M_float_SIZE 8
#define MAX_M_int32_SIZE 14
#define MAX_M_int64_SIZE 14
#define MAX_M_uint32_SIZE 9
#define MAX_M_uint64_SIZE 14
#define MAX_M_sint32_SIZE 9
#define MAX_M_sint64_SIZE 14
#define MAX_M_fixed32_SIZE 8
#define MAX_M_fixed64_SIZE 12
#define MAX_M_sfixed32_SIZE 8
#define MAX_M_sfixed64_SIZE 12
#define MAX_M_bool_SIZE 5

Every time I added an 'int32' to the structure, the maximum size increased by 14 bytes. I know that includes the key and probably a worst case for the encoding on the Variant, but what can I expect going forward? Are larger messages more efficient than smaller messages, or is it more dependent on the encoded values?

In summary, I'm just trying to get a feel for the memory space usage on a protocol buffer. I would hate trade ease of use for a large increase in the memory space necessary to store the configuration data. Thanks!

Marc Gravell · Accepted Answer

int32 is written as a varint, which means that for positive values the space it takes is dependent on the magnitude. Small positive values can be single-byte; larger positive values can take more. Negative values take a lot more space - in particular, it takes the same as a very large 64-bit number. "varint" is 7-bit plus continuation; so a negative number (or a large positive number) can take 10 bytes. To avoid this, if you know your values could be negative you can use sint32 / sint64 - this uses zig-zag encoding (then varint) - which basically makes small magnitude values take less space than large magnitude values (irrespective of sign).

If you need to optimize for worst-case, then maybe consider using fixed32 / fixed64 instead; this guarantees to take exactly 4 or 8 bytes.

Summary:

always (or almost always) positive, and generally of small-to-moderate size: int32/int64
positive or negative, and generally of small-to-moderate magnitude: sint32/sint64
large values, or need to guarantee size: fixed32/fixed64

There are a few others as well; the full details are in the language guide

(in all cases above, you also need to include the header, but that is usually 1 or 2 bytes)

What is the typical memory space usage for a Google protocol buffer?

Answers (2)

Related Questions