Reputation: 1200
I am able to read arvo
file into avroRDD
and am trying to convert into csvRDD
which contain all the values in comma separated. With the following code I am able to read specific field into csvRDD
.
val csvRDD = avroRDD .map({case (u, _) => u.datum.get("empname")})
How can I read all the values into csvRDD
instead of specifying field names. My result csvRDD
should contain records as follows
(100,John,25,IN)
(101,Ricky,38,AUS)
(102,Chris,68,US)
Upvotes: 0
Views: 983
Reputation: 622
Using Spark 1.2+ with the Spark-Avro integration library by Databricks, one can convert an avro rdd to a csv rdd as follows:
val sqlContext = new SQLContext(sc)
val episodes = sqlContext.avroFile("episodes.avro")
val csv = episodes.map(_.mkString(","))
Running csv.collect().foreach(println)
using this sample avro file prints
The Eleventh Hour,3 April 2010,11
The Doctor's Wife,14 May 2011,11
Horror of Fang Rock,3 September 1977,4
An Unearthly Child,23 November 1963,1
The Mysterious Planet,6 September 1986,6
Rose,26 March 2005,9
...
Upvotes: 1