zork
zork

Reputation: 2135

Spark: Unfold RDD to pairs?

As input I have an array where element is a tuple : (tag, elements) where tag is an integer and elements is a list of integers. I need to unfold this data, so as a result to get a collection where each input element becomes a tuple in the form: (tag, el1),(tag, el2),...(tag, elN). To illustrate:

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.log4j.Logger
import org.apache.log4j.Level

object Unfold {


  val data = Array( 
    (1,List(2,3,5)),
    (11,List(21,31,51)),
    (111, List(211,311,511))
  )

  val shoudGet = List (
    (1,2), (1,3), (1,5),
    (11,21), (11,31), (11,51),
    (111,211), (111,311), (111,511)
  )
  def main(args: Array[String]) {
    Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
    Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)
    // set up environment
    val conf = new SparkConf()
      .setMaster("local[5]")
      .setAppName("Simple")
      .set("spark.executor.memory", "2g")
    val sc = new SparkContext(conf)

    val rdd = sc.parallelize(data)
    val result = data.map{case (tag,list) => (tag, ???)}
  }
}

Any ideas how to unfold RDD element?

Upvotes: 1

Views: 252

Answers (2)

Metropolis
Metropolis

Reputation: 2128

Another approach:

val result = for {
 (tag, list) <- data
 x <- list
} yield (tag, x)

which is nothing more than syntactic sugar for a map and a flatMap, but in some cases for-comprehensions can be more readable.

Upvotes: 0

Jason Scott Lenderman
Jason Scott Lenderman

Reputation: 1918

Something like this should work:

val result = data.flatMap({ case (tag, list) => list.map(x => (tag, x)) })

Or this might be a bit faster in some situations:

val result = data.flatMap({ case (tag, list) => list.view.map(x => (tag, x)) })

Upvotes: 1

Related Questions