Reputation: 51
I'm trying to understand protocol buffer encoding method, when translating message to binary(or hexadecimal) format, I can't understand how the embedded message is encoded.
I guess maybe it's related to memory address, but I can't find the accurate relationship.
Here is what i've done.
Step 1: I defined two messages in test.proto
file,
syntax = "proto3";
package proto_test;
message Education {
string college = 1;
}
message Person {
int32 age = 1;
string name = 2;
Education edu = 3;
}
Step 2: And then I generated some go
code,
protoc --go_out=. test.proto
Step 3: Then I check the encoded format of the message,
p := proto_test.Person{
Age: 666,
Name: "Tom",
Edu: &proto_test.Education{
College: "SOMEWHERE",
},
}
var b []byte
out, err := p.XXX_Marshal(b, true)
if err != nil {
log.Fatalln("fail to marshal with error: ", err)
}
fmt.Printf("hexadecimal format:% x \n", out)
fmt.Printf("binary format:% b \n", out)
which outputs,
hexadecimal format:08 9a 05 12 03 54 6f 6d 1a fd 96 d1 08 0a 09 53 4f 4d 45 57 48 45 52 45
binary format:[ 1000 10011010 101 10010 11 1010100 1101111 1101101 11010 11111101 10010110 11010001 1000 1010 1001 1010011 1001111 1001101 1000101 1010111 1001000 1000101 1010010 1000101]
what I understand is ,
08 - int32 wire type with tag number 1
9a 05 - Varints for 666
12 - string wire type with tag number 2
03 - length delimited which is 3 byte
54 6f 6d - ascii for "TOM"
1a - embedded message wire type with tag number 3
fd 96 d1 08 - ? (here is what I don't understand)
0a - string wire type with tag number 1
09 - length delimited which is 9 byte
53 4f 4d 45 57 48 45 52 45 - ascii for "SOMEWHERE"
What does fd 96 d1 08
stands for?
It seems like that d1 08
always be there, but fd 96
sometimes change, don't know why. Thanks for answering :)
Add
I debugged the marshal process and reported a bug here.
Upvotes: 3
Views: 2327
Reputation: 1580
At that location I/you would expect the number of bytes in the embedded message.
I have repeated your experiment in Python.
msg = Person()
msg.age = 666
msg.name = "Tom"
msg.edu.college = "SOMEWHERE"
I got a different result, the one I would expect. A varint stating the size of the embedded message.
0x08
0x9A, 0x05
0x12
0x03
0x54 0x6F 0x6D
0x1A
0x0B <- Equals to 11 decimal.
0x0A
0x09
0x53 0x4F 0x4D 0x45 0x57 0x48 0x45 0x52 0x45
Next I deserialized your bytes:
msg2 = Person()
str = bytearray(b'\x08\x9a\x05\x12\x03\x54\x6f\x6d\x1a\xfd\x96\xd1\x08\x0a\x09\x53\x4f\x4d\x45\x57\x48\x45\x52\x45')
msg2.ParseFromString(str)
print(msg2)
The result of this is perfect:
age: 666
name: "Tom"
edu {
college: "SOMEWHERE"
}
The conclusion I come to is that there are some different ways of encoding in Protobuf. I do not know what is done in this case but I know the example of a negative 32 bit varint. A positive varint is encoded in five bytes, a negative value is cast as a 64 bit value and encoded to ten bytes.
Upvotes: 2