Surender Raja
Surender Raja

Reputation: 3599

How do we flatten a List with same key in spark

I have a RDD like below

 Array[(String, List[Int])] = Array((2008,List(40, 20)), (2000,List(30, 10)), (2001,List(9)))

I am looking to flatten the values for the same key .

Expected output :

 Array[(String, Int)]

 Array((2008,40) ,(2008,20) ,(2000,30),(2000,10),(2001,9))

Can Someone help me on getting this result?

Upvotes: 1

Views: 694

Answers (2)

freedev
freedev

Reputation: 30027

I would try something like that:

val l = Array((2008,List(40, 20)), (2000,List(30, 10)), (2001,List(9)))

l.flatMap(pair => pair._2.map(listElem => (pair._1, listElem)))

Upvotes: 1

Evgeny Veretennikov
Evgeny Veretennikov

Reputation: 4229

Transform each tuple into list of tuples, then just use flatten:

scala> val arr = Array(("2008", List(40, 20)), ("2000", List(30, 10)), ("2001", List(9)))
arr: Array[(String, List[Int])] = Array((2008,List(40, 20)), (2000,List(30, 10)), (2001,List(9)))
scala> arr.map { case (s, list) => list map { i => (s, i) } }.flatten
res3: Array[(String, Int)] = Array((2008,40), (2008,20), (2000,30), (2000,10), (2001,9))

Upvotes: 1

Related Questions