Mahadevan
Mahadevan

Reputation: 131

I need help parsing a file in scala for running a spark job

I'm running a Spark Job in Scala and I'm struck with parsing the input file.

The Input file(TAB separated) is something like,

date=20160701 name=mike age=26

date=20160402 name=john age=33

I want to parse it and extract only values and not the keys, such as,

20160701 mike 26

20160402 john 33

How can this be achieved in SCALA?

I'm using,

SCALA VERSION: 2.11

Upvotes: 0

Views: 113

Answers (3)

Hutashan Chandrakar
Hutashan Chandrakar

Reputation: 425

val rdd = sc.textFile() rdd.map(x => x.split("\t")).map(x => x.split("=")(1)).map(x => x.mkstring("\t")).saveAsTextFile("")

Upvotes: 1

The Archetypal Paul
The Archetypal Paul

Reputation: 41769

Test data

val data = "date=20160701\tname=mike\tage=26\ndate=20160402 name=john\tage=33\n"

One statement to do what you asked

val rdd = sc.parallelize(data.split('\n'))
            .map(_.split('\t') // split into key=value
                  .map(_.split('=')(1))) // split those at "=" and select only the value

Display what we got

rdd.collect().foreach(r=>println(r.mkString(",")))
// 20160701,mike,26
// 20160402,john,33

But don't do this for real code. It's very fragile in the face of data format errors, etc. Use CSVParser or something instead as Narendra Parmar suggests.

Upvotes: 1

Narendra Parmar
Narendra Parmar

Reputation: 1409

You can use CSVParser() and you know the location for key, it will be easy and clean

Upvotes: 1

Related Questions