codehammer
codehammer

Reputation: 885

Spark RDD tuple transformation

I'm trying to transform an RDD of tuple of Strings of this format :

(("abc","xyz","123","2016-02-26T18:31:56"),"15") TO

(("abc","xyz","123"),"2016-02-26T18:31:56","15")

Basically seperating out the timestamp string as a seperate tuple element. I tried following but it's still not clean and correct.

val result = rdd.map(r => (r._1.toString.split(",").toVector.dropRight(1).toString, r._1.toString.split(",").toList.last.toString, r._2))

However, it results in

(Vector(("abc", "xyz", "123"),"2016-02-26T18:31:56"),"15")

The expected output I'm looking for is

(("abc", "xyz", "123"),"2016-02-26T18:31:56","15")

This way I can access the elements using r._1, r._2 (the timestamp string) and r._3 in a seperate map operation.

Any hints/pointers will be greatly appreciated.

Upvotes: 0

Views: 3524

Answers (1)

Ton Torres
Ton Torres

Reputation: 1519

Vector.toString will include the String 'Vector' in its result. Instead, use Vector.mkString(",").

Example:

scala> val xs = Vector(1,2,3)
xs: scala.collection.immutable.Vector[Int] = Vector(1, 2, 3)

scala> xs.toString
res25: String = Vector(1, 2, 3)

scala> xs.mkString
res26: String = 123

scala> xs.mkString(",")
res27: String = 1,2,3

However, if you want to be able to access (abc,xyz,123) as a Tuple and not as a string, you could also do the following:

val res = rdd.map{
  case ((a:String,b:String,c:String,ts:String),d:String) => ((a,b,c),ts,d)
}

Upvotes: 1

Related Questions