Reputation: 131
I'm running a Spark Job in Scala and I'm struck with parsing the input file.
The Input file(TAB separated) is something like,
date=20160701 name=mike age=26
date=20160402 name=john age=33
I want to parse it and extract only values and not the keys, such as,
20160701 mike 26
20160402 john 33
How can this be achieved in SCALA?
I'm using,
SCALA VERSION: 2.11
Upvotes: 0
Views: 113
Reputation: 425
val rdd = sc.textFile() rdd.map(x => x.split("\t")).map(x => x.split("=")(1)).map(x => x.mkstring("\t")).saveAsTextFile("")
Upvotes: 1
Reputation: 41769
Test data
val data = "date=20160701\tname=mike\tage=26\ndate=20160402 name=john\tage=33\n"
One statement to do what you asked
val rdd = sc.parallelize(data.split('\n'))
.map(_.split('\t') // split into key=value
.map(_.split('=')(1))) // split those at "=" and select only the value
Display what we got
rdd.collect().foreach(r=>println(r.mkString(",")))
// 20160701,mike,26
// 20160402,john,33
But don't do this for real code. It's very fragile in the face of data format errors, etc. Use CSVParser
or something instead as Narendra Parmar suggests.
Upvotes: 1
Reputation: 1409
You can use CSVParser() and you know the location for key, it will be easy and clean
Upvotes: 1