datasure
datasure

Reputation: 53

tuple values for a key in scala rdd

I have rdd with key-value pair in Scala. I want to form rdd in such a way that it will be (key , tuple(values)).

I have tried using map but did not work. If it is pyspark then I would have used map(lambda x : x[0] , list(x[1:]))

(a,1,2,3,4), (b,4,5,6),(c,1,3) to [a,(1,2,3,4)], [b,(4,5,6)], [c,(1,3)]

Upvotes: 3

Views: 922

Answers (1)

Krzysztof Atłasik
Krzysztof Atłasik

Reputation: 22635

In Scala tuples are hard to handle in a generic way (it will change in Scala 3), so the most straightforward solution for you would be just to create helper object with overloaded function:

object TupleUtil {   
  def splitHead[K,V](t: (K,V,V)): (K,(V,V)) = t._1 -> (t._2, t._3)
  def splitHead[K,V](t: (K,V,V,V)): (K,(V,V,V)) = t._1 -> (t._2, t._3, t._4)
  def splitHead[K,V](t: (K,V,V,V,V)): (K,(V,V,V,V)) = t._1 -> (t._2, t._3, t._4, t._5)
  //etc up to 22
}

Or if you can use shapeless, then you could simply do:

import shapeless.syntax.std.tuple._

(t.head, t.tail)

To use it, simply add it to your build.sbt:

libraryDependencies += "com.chuusai" %% "shapeless" % "2.3.3"

Upvotes: 4

Related Questions