Reputation: 594
I need to read sequence files that contains messages with different schemas:
Split them by message type then write the result to avro:
Using Spark, I'm able to read and parse the data:
spark.sparkContext.sequenceFile(path, classOf[LongWritable], classOf[BytesWritable]).map { (key, value) =>
val record: GenericRecord = parseValue(value)
val messageType = extractType(record)
messageType -> record
}
But from there I'm not sure how to pursue, I've tried sequence file writer but I'm not able to write a proper avro file.
Upvotes: 0
Views: 19