mba12
mba12

Reputation: 2782

How to extract schema from an avro file in Java

How do you extract first the schema and then the data from an avro file in Java? Identical to this question except in java.

I've seen examples of how to get the schema from an avsc file but not an avro file. What direction should I be looking in?

Schema schema = new Schema.Parser().parse(
    new File("/home/Hadoop/Avro/schema/emp.avsc")
);

Upvotes: 28

Views: 34214

Answers (3)

Eugene
Eugene

Reputation: 11055

Thanks for @Helder Pereira's answer. As a complement, the schema can also be fetched from getSchema() of GenericRecord instance.
Here is an live demo about it, the link above shows how to get data and schema in java for Parquet, ORC and AVRO data format.

Upvotes: 2

Helder Pereira
Helder Pereira

Reputation: 5756

If you want know the schema of a Avro file without having to generate the corresponding classes or care about which class the file belongs to, you can use the GenericDatumReader:

DatumReader<GenericRecord> datumReader = new GenericDatumReader<>();
DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(new File("file.avro"), datumReader);
Schema schema = dataFileReader.getSchema();
System.out.println(schema);

And then you can read the data inside the file:

GenericRecord record = null;
while (dataFileReader.hasNext()) {
    record = dataFileReader.next(record);
    System.out.println(record);
}

Upvotes: 44

Carlos Bribiescas
Carlos Bribiescas

Reputation: 4397

You can use the data bricks library as shown here https://github.com/databricks/spark-avro which will load the avro file into a Dataframe (Dataset<Row>)

Once you have a Dataset<Row>, you can directly get the schema using df.schema()

Upvotes: 1

Related Questions