Maxence Cramet
Maxence Cramet

Reputation: 594

Use Spark in Scala to read sequence files containing multiple schemas and write to avro

I need to read sequence files that contains messages with different schemas:

Split them by message type then write the result to avro:

Using Spark, I'm able to read and parse the data:

spark.sparkContext.sequenceFile(path, classOf[LongWritable], classOf[BytesWritable]).map { (key, value) =>
  val record: GenericRecord = parseValue(value)
  val messageType = extractType(record)
  messageType -> record
}

But from there I'm not sure how to pursue, I've tried sequence file writer but I'm not able to write a proper avro file.

Upvotes: 0

Views: 19

Answers (0)

Related Questions