Reputation: 1647
Is there a way to generate schema less avro from Apache spark? I can see a way to generate it through Java/Scala using apache avro library and through confluent avro. When I write Avro from Spark in below way, it creates Avro's with schema. I want to create without schema to reduce the size of final dataset.
df.write.format("avro").save("person.avro")
Upvotes: 0
Views: 658
Reputation: 18003
You need not worry. And you cannot obviate the approach.
AVRO has the data and the schema, always.
AVRO is different to JSON which stores the schema per record that resides in the data itself.
With AVRO the schema is stored once per file. So there is little overhead to consider.
Upvotes: 2