Carsten
Carsten

Reputation: 2040

How to split RDD of (String, Array[String]) into RDD of (String, String) for each item in array?

I have a PairRDD in the form RDD[(String, Array[String])]. I want to flatten the values so that I have an RDD[(String, String)] where each of the elements in the Array[String] of the first RDD become a dedicated element in the 2nd RDD.

For instance, my first RDD has the following elements:

("a", Array("x", "y"))
("b", Array("y", "z"))

The result I want is this:

("a", "x")
("a", "y")
("b", "y")
("b", "z")

How can I do this? flatMapValues(f: Array[String] => TraverableOnce[String]) seems to be the right choice here, but what do I need to use as argument f?

Upvotes: 0

Views: 1919

Answers (1)

Carsten
Carsten

Reputation: 2040

To achieve the desired result, do:

val rdd1: RDD[(Any, Array[Any])] = ...
val rddFlat: RDD[(Any, Any)] = rdd1.flatMapValues(identity[Array[Any]])

The result looks like the one asked for in the question.

Upvotes: 4

Related Questions