Reputation: 2135
As input I have an array where element is a tuple : (tag, elements)
where tag
is an integer and elements
is a list of integers. I need to unfold this data, so as a result to get a collection where each input element becomes a tuple in the form: (tag, el1),(tag, el2),...(tag, elN)
. To illustrate:
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.log4j.Logger
import org.apache.log4j.Level
object Unfold {
val data = Array(
(1,List(2,3,5)),
(11,List(21,31,51)),
(111, List(211,311,511))
)
val shoudGet = List (
(1,2), (1,3), (1,5),
(11,21), (11,31), (11,51),
(111,211), (111,311), (111,511)
)
def main(args: Array[String]) {
Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)
// set up environment
val conf = new SparkConf()
.setMaster("local[5]")
.setAppName("Simple")
.set("spark.executor.memory", "2g")
val sc = new SparkContext(conf)
val rdd = sc.parallelize(data)
val result = data.map{case (tag,list) => (tag, ???)}
}
}
Any ideas how to unfold RDD element?
Upvotes: 1
Views: 252
Reputation: 2128
Another approach:
val result = for {
(tag, list) <- data
x <- list
} yield (tag, x)
which is nothing more than syntactic sugar for a map and a flatMap, but in some cases for-comprehensions can be more readable.
Upvotes: 0
Reputation: 1918
Something like this should work:
val result = data.flatMap({ case (tag, list) => list.map(x => (tag, x)) })
Or this might be a bit faster in some situations:
val result = data.flatMap({ case (tag, list) => list.view.map(x => (tag, x)) })
Upvotes: 1