Tituuss10
Tituuss10

Reputation: 92

Why can't avro take the schema from the .avro file?

Here is the deserializer from tutorialspoint.

public class Deserialize {
   public static void main(String args[]) throws Exception{

      //Instantiating the Schema.Parser class.
      Schema schema = new Schema.Parser().parse(new File("/home/Hadoop/Avro/schema/emp.avsc"));
      DatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>(schema);
      DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(new File("/home/Hadoop/Avro_Work/without_code_gen/mydata.txt"), datumReader);
      GenericRecord emp = null;

      while (dataFileReader.hasNext()) {
         emp = dataFileReader.next(emp);
         System.out.println(emp);
      }
      System.out.println("hello");
   }
}

My question is: If there is already a schema in the .avro file why do I have to pass the schema as well? I find it very inconvenient having to provide the schema in order to parse the file.

Upvotes: 0

Views: 475

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 192023

Avro requires two schemas for resolution - a reader schema and a writer schema.

The writer schema is included in the file.

And you can parse the schema out of the file

String filepath = ...;
DataFileReader<Void> reader = new DataFileReader<>(Util.openSeekableFromFS(filepath),
    new GenericDatumReader<>());
System.out.println(reader.getSchema().toString(true));

This is how java -jar avro-tools.jar getschema works

And you may need the Util.openSeekableFromFS method since it seems to be package private

Upvotes: 1

Related Questions