Ramakrishna
Ramakrishna

Reputation: 1200

Spark : Avro RDD to csv


I am able to read arvo file into avroRDD and am trying to convert into csvRDD which contain all the values in comma separated. With the following code I am able to read specific field into csvRDD.

val csvRDD = avroRDD .map({case (u, _) => u.datum.get("empname")})

How can I read all the values into csvRDD instead of specifying field names. My result csvRDD should contain records as follows

(100,John,25,IN)
(101,Ricky,38,AUS)
(102,Chris,68,US)

Upvotes: 0

Views: 983

Answers (1)

Marcel Krcah
Marcel Krcah

Reputation: 622

Using Spark 1.2+ with the Spark-Avro integration library by Databricks, one can convert an avro rdd to a csv rdd as follows:

val sqlContext = new SQLContext(sc)
val episodes = sqlContext.avroFile("episodes.avro")
val csv = episodes.map(_.mkString(","))

Running csv.collect().foreach(println) using this sample avro file prints

The Eleventh Hour,3 April 2010,11
The Doctor's Wife,14 May 2011,11
Horror of Fang Rock,3 September 1977,4
An Unearthly Child,23 November 1963,1
The Mysterious Planet,6 September 1986,6
Rose,26 March 2005,9
...

Upvotes: 1

Related Questions