Reputation: 5041
Unfortunately we still have to use spark 1.0.0 and need to work with RDDs. I have a RDD that is created from a CSV file.
val serialRDD = sc.textFile(path)
If we print each line of the RDD, we get something like this (an id and a string) :
1929 abc
2384 def
8753 ghi
3893 jkl
I want to be able to add another column being another id, which is going to be a string like "SERIAL-" where RANK would be 1,2,3 etc autoincrementing by 1
The output should be like:
1929 abc SERIAL-1
2384 def SERIAL-2
8753 ghi SERIAL-3
3893 jkl SERIAL-4
How could I get this done using RDD?
Upvotes: 1
Views: 762
Reputation: 5315
You can use zipWithIndex
and map
to get it done :
serialRDD.zipWithIndex.map{ case (r, i) => (r._1, r._2, s"SERIAL-${i+1}") }
I used string interpolation to get the SERIAL-X
string. I also incremented the index because zipWithIndex
starts at the index 0.
Upvotes: 4