Lan
Lan

Reputation: 6660

Convert protobuf to Avro

I am trying to convert a protobuf object to Avro. I am using

//myProto object is deserialized using google protobuf API
ProtobufDatumWriter<MyProto> pbWriter = new ProtobufDatumWriter<MyProto>(MyProto.class);
FileOutputStream fo = new FileOutputStream(args[0]);
Encoder e = EncoderFactory.get().binaryEncoder(fo, null);
pbWriter.write(myProto, e);
fo.flush();

The avro file was created successfully. If I cat the file, I can see the data in the file. However, when I tried to use avro-tools to get schema or meta info about the saved avro file, it says

Exception in thread "main" java.io.IOException: Not a data file.
    at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
    at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
    at org.apache.avro.tool.DataFileGetSchemaTool.run(DataFileGetSchemaTool.java:47)

Look at the Avro source code, the error means it does not have the first 4 bytes matching the MAGIC first 4 bytes. I am trying to see if I have done anything wrong.

Appreciate any help you can give me.

Upvotes: 5

Views: 9733

Answers (1)

Lan
Lan

Reputation: 6660

I figure out why my codes was not working. Instead of using ProtobufDatumWriter to write to file directly, we should wrap it in the DataFileWriter, which is a container.

    ProtobufDatumWriter<MyProto> pbWriter = new ProtobufDatumWriter<MyProto>(MyProto.class);
    DataFileWriter<MyProto> dataFileWriter = new DataFileWriter<MyProto>(pbWriter);
    Schema schema= ProtobufData.get().getSchema(MyProto.class);
    dataFileWriter.create(schema, new File("test.avro"));
    dataFileWriter.append(myProto);
    dataFileWriter.close();

Upvotes: 6

Related Questions