Reputation: 5589
I found an interesting gotcha with protocol buffers. If you have two similar messages it is possible to parse one as if it were the other using the C++ API or the command line.
The limited documentation for ParseFromString does not mention that it need not consume all the string and will not fail if it doesn't.
I had expected ParseFromString to fail to parse a message of type A if it is presented with a message of type B. After all the message contains extra data. However, this is not the case. An example script demonstrates the issue:
#!/bin/sh
cat - >./foobar.proto <<EOF
syntax = "proto3";
package demo;
message A
{
uint64 foo = 1;
};
enum flagx {
y = 0;
z = 1;
}
message B {
uint64 foolish = 1;
flagx bar = 2;
};
EOF
cat - >./mess.B.in.txtfmt <<EOF
foolish: 10
bar: y
EOF
cat - >./mess.in.txtfmt <<EOF
foo: 10
EOF
protoc --encode=demo.A foobar.proto <./mess.A.in.txtfmt >./mess.A.proto
protoc --encode=demo.B foobar.proto <./mess.B.in.txtfmt >./mess.B.proto
protoc --decode=demo.A foobar.proto >./mess.out.txtfmt <./mess.B.proto
echo "in: "
cat mess.B.in.txtfmt
echo "out: "
cat mess.out.txtfmt
echo "xxd mess.A.proto:"
xxd mess.A.proto
echo "xxd mess.B.proto:"
xxd mess.B.proto
The output is:
in:
foolish: 10
bar: 20
out:
foo: 10
xxd mess.A.proto:
00000000: 080a
xxd mess.B.proto:
00000000: 080a
So the binary message is identical for both messages A and B.
If you alter the protocol so that instead of an enum you have another varint (uint64) you get distinct binary messages but ParseFromString will still successfully parse the longer message as the shorter one.
To really confuse things it also seems to be able to parse the shorter message as the longer one.
Is this a bug or a feature?
Upvotes: 3
Views: 1753
Reputation: 5589
I think this is by design but the documentation could be better.
This confusion may arise if you try to use the API without reading up about the over the wire format first. The wire format is not irrelevant to the API as you might expect.
The wire format emphasises compactness over correctness. If you want to check the correctness of a message you are invited to use other means.
You might (arguably should or must) include in your message one or more of the following:
The second point about being able to parse a shorter message as a longer one is because in protocol buffers 3 all fields are optional. protocol buffers 2 had a concept of a required field. Its removal caused some controversy (see for example Why required and optional is removed in Protocol Buffers 3 & https://capnproto.org/faq.html#how-do-i-make-a-field-required-like-in-protocol-buffers). A field that has the default value (typically 0) is not included in the message. Also the name of fields are replaced by numbers. Thus two messages for 'different' protocol might very easily be interpretable by both.
Upvotes: 3