kambiz
kambiz

Reputation: 27

Apache Spark - Scala - how to FlatMap (k, {v1,v2,v3,...}) to ((k,v1),(k,v2),(k,v3),...)

I got this:

val vector: RDD[(String, Array[String])] = [("a", {v1,v2,..}),("b", {u1,u2,..})]

wanna convert to:

RDD[(String, String)] = [("a",v1), ("a",v2), ..., ("b",u1), ("b",u2), ...]

Any idea how to do that using flatMap.

Upvotes: 2

Views: 362

Answers (3)

avr
avr

Reputation: 4893

Using single parameter function:

vector.flatMap(data => data._2.map((data._1, _)))

Upvotes: 0

Avihoo Mamka
Avihoo Mamka

Reputation: 4786

You can definitely need to use flatMap like you mentioned, but in addition, you need to use scala map as well.

For example:

val idToVectorValue: RDD[(String, String ] = vector.flatMap((id,values) => values.map(value => (id, value)))

Upvotes: 2

Yuval Itzchakov
Yuval Itzchakov

Reputation: 149608

This:

vector.flatMap { case (x, arr) => arr.map((x, _)) }

Will give you:

scala> val vector = sc.parallelize(Vector(("a", Array("b", "c")), ("b", Array("d", "f"))))
vector: org.apache.spark.rdd.RDD[(String, Array[String])] =
               ParallelCollectionRDD[3] at parallelize at <console>:27


scala> vector.flatMap { case (x, arr) => arr.map((x, _)) }.collect
res4: Array[(String, String)] = Array((a,b), (a,c), (b,d), (b,f))

Upvotes: 4

Related Questions