Reputation: 2782
How do you extract first the schema and then the data from an avro file in Java? Identical to this question except in java.
I've seen examples of how to get the schema from an avsc file but not an avro file. What direction should I be looking in?
Schema schema = new Schema.Parser().parse(
new File("/home/Hadoop/Avro/schema/emp.avsc")
);
Upvotes: 28
Views: 34214
Reputation: 11055
Thanks for @Helder Pereira's answer. As a complement, the schema can also be fetched from getSchema()
of GenericRecord
instance.
Here is an live demo about it, the link above shows how to get data and schema in java for Parquet
, ORC
and AVRO
data format.
Upvotes: 2
Reputation: 5756
If you want know the schema of a Avro file without having to generate the corresponding classes or care about which class the file belongs to, you can use the GenericDatumReader
:
DatumReader<GenericRecord> datumReader = new GenericDatumReader<>();
DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(new File("file.avro"), datumReader);
Schema schema = dataFileReader.getSchema();
System.out.println(schema);
And then you can read the data inside the file:
GenericRecord record = null;
while (dataFileReader.hasNext()) {
record = dataFileReader.next(record);
System.out.println(record);
}
Upvotes: 44
Reputation: 4397
You can use the data bricks library as shown here https://github.com/databricks/spark-avro which will load the avro file into a Dataframe
(Dataset<Row>
)
Once you have a Dataset<Row>
, you can directly get the schema using df.schema()
Upvotes: 1