Protocol Buffers - how is the extensibility and backward compatibility achieved?

Question

Please help me understand the backward compatibility and extensibility properties of the protocol buffers' internal implementation.

How is backwards compatibility achieved in the face of deleting data fields? I imagine that the generated data access code returns empty values for properties that are not present in the data stream, and the consumer code has to always specifically check for those empty values and act accordingly. How would the empty values be standardized?

Also in this case, how does the old code "know" that the property is not present in the data stream anymore?

I imagine that one solution would be that old data is never deleted from the internal stream specification and only replaced with empty values but the same could be probably achieved with internal versioning on the fields.
Perhaps a more clear question: how does the old code know to ignore new data added by new versions of the .proto spec? This is probably somewhat more straightforward than 1) by having a size field in the internal serialized structure, and only reading that many bytes at a time, while also only appending new fields at the end of the struct.

Trying to understand all this in order to extend an old data format to provide backward/forward compatibility between code and data as a side project.

Edit: formatting.

Thanks!

Bruce Martin · Accepted Answer

Some background information, In protocol buffers you define a field like

 optional string msg = 1;

The number (1 in this example) is used to identify the field in the Data-Message (or Data-Record) and for matching to proto-message used by your program.

Protocol buffers store data Messages like

    FieldId1 Data1
    FieldId2 Data2
        .....
    FieldIdn Datan

Where fieldId consists of the Field-Number and Field type. If a field does not have any data, it is not stored in the output message (record). So you may have

   FieldId3 Data3
   FieldId7 Data7
   FieldId11 Data11

Answers to your questions:

In protocol-buffers every field has one attribute of these attributes: Required, Optional and Repeated. So to delete a field you can make it optional and not store any value in it. Some people routinely make most fields optional
Protocol Matches the field-numbers in the Data-Message with field-numbers in the Proto definition. In java at-least, there is are unknown fields Map where any extra fields are stored.

It is essential you document fields (both the Field-Name and Field-Number) you remove to make sure you Never reuse a field name/number.

If you reuse a field, you could break existing code

Protocol Buffers - how is the extensibility and backward compatibility achieved?

Answers (1)

Related Questions