How Google protobuf format reduces size of the object after it encoded

Question

package sample;

import java.util.ArrayList;
import java.util.List;

import org.apache.commons.lang.SerializationUtils;

import sample.ProtoObj.Attachment;

public class Main {

    public static void main(String args[]){
        POJO pojo = new POJO();
        pojo.setContent("content");
        List att = new ArrayList();
        sample.POJO.Attachment attach = pojo.new Attachment();
         attach.setName("Attachment Name");
         attach.setId("0e068652dbd9");
         attach.setSize(1913558);
         att.add(attach);
         pojo.setAttach(att);
         byte[] byyy = SerializationUtils.serialize(pojo);
         System.out.println("Size of the POJO ::: "+byyy.length);

         ProtoObj tc = new ProtoObj();
         List attachList = new ArrayList();
         Attachment attach1 = tc.new Attachment();
         attach1.setName("Attachment Name");
         attach1.setId("0e068652dbd9");
         attach1.setSize(1913558);
         attachList.add(attach1);
         tc.setContent("content");
         tc.setAttach(attachList);

         byte[] bhh = tc.getProto(tc);

         System.out.println("Size of the PROTO ::: "+bhh.length);

    }

}

I have used above program to compute the size of the encoded/Serialized Object using Protobuf and POJO. Both the objects handle same set of the data. But the output shows drastic difference in the size of the object.

Output:

Size of the POJO ::: 336
Size of the PROTO ::: 82

Also I have read the below link to know how google protobuf formats affect the size of the encoded object.

https://developers.google.com/protocol-buffers/docs/encoding

But I'm unable to understand it. Please explain me to understand simply.

hris.to · Accepted Answer

Protobuf doesn't send the schema alongside with the data. So both sides need to have the schema in order to deserialise passed data.

Because of that you can optimise and put each field right after another. Something like this:

AttachmentName0e068652dbd91913558

And all this in binary format. This in JSON would look like:

{"name": "AttachmentName", "id": "0e068652dbd9", "size": "1913558"}

As you can see the schema is encoded in the serialised message itself.

I'm not completely aware of Java SerialisationUtils, but I think they pass or encode the schema also and that's why you see this size difference.

How Google protobuf format reduces size of the object after it encoded

Answers (1)

Related Questions