user1189851
user1189851

Reputation: 5041

Add a column with a rank to an rdd in Spark Scala

Unfortunately we still have to use spark 1.0.0 and need to work with RDDs. I have a RDD that is created from a CSV file.

val serialRDD = sc.textFile(path)

If we print each line of the RDD, we get something like this (an id and a string) :

1929  abc
2384  def
8753  ghi
3893  jkl

I want to be able to add another column being another id, which is going to be a string like "SERIAL-" where RANK would be 1,2,3 etc autoincrementing by 1

The output should be like:

1929  abc  SERIAL-1
2384  def  SERIAL-2
8753  ghi  SERIAL-3
3893  jkl  SERIAL-4

How could I get this done using RDD?

Upvotes: 1

Views: 762

Answers (1)

cheseaux
cheseaux

Reputation: 5315

You can use zipWithIndex and map to get it done :

serialRDD.zipWithIndex.map{ case (r, i) => (r._1, r._2, s"SERIAL-${i+1}") }

I used string interpolation to get the SERIAL-X string. I also incremented the index because zipWithIndex starts at the index 0.

Upvotes: 4

Related Questions