How to explain embedded message binary wire format of protocol buffer?

Question

I'm trying to understand protocol buffer encoding method, when translating message to binary(or hexadecimal) format, I can't understand how the embedded message is encoded.

I guess maybe it's related to memory address, but I can't find the accurate relationship.

Here is what i've done.

Step 1: I defined two messages in test.proto file,

syntax = "proto3";
package proto_test;

message Education {
    string college = 1;
}

message Person {
    int32 age = 1;
    string name = 2;
    Education edu = 3;
}

Step 2: And then I generated some go code,

protoc --go_out=. test.proto

Step 3: Then I check the encoded format of the message,

p := proto_test.Person{
    Age:  666,
    Name: "Tom",
    Edu: &proto_test.Education{
        College: "SOMEWHERE",
    },
}
var b []byte
out, err := p.XXX_Marshal(b, true)
if err != nil {
    log.Fatalln("fail to marshal with error: ", err)
}
fmt.Printf("hexadecimal format:% x 
", out)
fmt.Printf("binary format:% b 
", out)

which outputs,

hexadecimal format:08 9a 05 12 03 54 6f 6d 1a fd 96 d1 08 0a 09 53 4f 4d 45 57 48 45 52 45 
binary format:[ 1000  10011010  101  10010  11  1010100  1101111  1101101  11010  11111101  10010110  11010001  1000  1010  1001  1010011  1001111  1001101  1000101  1010111  1001000  1000101  1010010  1000101]

what I understand is ,

08                         - int32 wire type with tag number 1
9a 05                      - Varints for 666
12                         - string wire type with tag number 2
03                         - length delimited which is 3 byte
54 6f 6d                   - ascii for "TOM"
1a                         - embedded message wire type with tag number 3
fd 96 d1 08                - ? (here is what I don't understand)
0a                         - string wire type with tag number 1
09                         - length delimited which is 9 byte
53 4f 4d 45 57 48 45 52 45 - ascii for "SOMEWHERE"

What does fd 96 d1 08 stands for? It seems like that d1 08 always be there, but fd 96 sometimes change, don't know why. Thanks for answering :)

Add

I debugged the marshal process and reported a bug here.

How to explain embedded message binary wire format of protocol buffer?

Answers (1)

Related Questions