Breandán
Breandán

Reputation: 1883

Spark/Scala: Expand a list of (List[String], String) tuples

Basically this question only for Scala.

How can I do the following transformation given an RDD with elements of the form

(List[String], String) => (String, String)

e.g.

([A,B,C], X)
([C,D,E], Y)

to

(A, X)
(B, X)
(C, X)
(C, Y)
(D, Y)
(E, Y)

So

Upvotes: 2

Views: 4034

Answers (5)

Gaurav Abbi
Gaurav Abbi

Reputation: 645

Using beautiful for comprehensions and making the parameters generic

    def convert[F, S](input: (List[F], S)): List[(F, S)] = {
    for {
      x <- input._1
    } yield {
      (x, input._2)
    }
  }

a sample call

convert(List(1, 2, 3), "A")

will give you

List((1,A), (2,A), (3,A))

Upvotes: 0

Till Rohrmann
Till Rohrmann

Reputation: 13346

With Spark you can solve your problem with:

object App {
  def main(args: Array[String]) {
    val input = Seq((List("A", "B", "C"), "X"), (List("C", "D", "E"), "Y"))

    val conf = new SparkConf().setAppName("Simple Application").setMaster("local[4]")
    val sc = new SparkContext(conf)

    val rdd = sc.parallelize(input)

    val result = rdd.flatMap {
      case (list, label) => {
        list.map( (_, label))
      }
    }

    result.foreach(println)
  }
}

This will output:

(C,Y)
(D,Y)
(A,X)
(B,X)
(E,Y)
(C,X)

Upvotes: 2

ka4eli
ka4eli

Reputation: 5424

  val l = (List(1, 2, 3), "A")
  val result = l._1.map((_, l._2))
  println(result)

Will give you:

List((1,A), (2,A), (3,A))

Upvotes: 0

GameOfThrows
GameOfThrows

Reputation: 4510

I think that the RDD flatMapValues suits this case best.

val A = List((List(A,B,C),X),(List(A,B,C),Y))
val rdd = sc.parallelize(A)
val output = rdd.map(x=>(x._2,x._1)).flatMapValues(x=>x)

which will map X with every value in the List(A,B,C) resulting in RDD of pairs of RDD[(X,A),(X,B),(X,C)...(Y,A),(Y,B),(Y,C)]

Upvotes: 1

Marth
Marth

Reputation: 24802

scala> val l = List((List('a, 'b, 'c) -> 'x), List('c, 'd, 'e) -> 'y)
l: List[(List[Symbol], Symbol)] = List((List('a, 'b, 'c),'x),
                                       (List('c, 'd, 'e),'y))

scala> l.flatMap { case (innerList, c) => innerList.map(_ -> c) }
res0: List[(Symbol, Symbol)] = List(('a,'x), ('b,'x), ('c,'x), ('c,'y),
                                    ('d,'y), ('e,'y))

Upvotes: 8

Related Questions