user3803714
user3803714

Reputation: 5389

Spark rdd unique values across a paired rdd

I have a Spark RDD of this datatype: RDD[(Int, Array[Int])])

Sample values of that RDD are:

100, Array(1,2,3,4,5)

200,Array(1,2,50,20)

300, Array(30,2,400,1)

I would like to get all the unique values among all the Array elements of this RDD I don't care about the key, just want to get all the unique values. So the result from the above sample is (1,2,3,4,5,20,30,50,400).

What will be an efficient way to do that.

Upvotes: 1

Views: 5133

Answers (1)

Jason Scott Lenderman
Jason Scott Lenderman

Reputation: 1918

I think this should probably work:

val result = rdd.flatMap(_._2).distinct

if you want the result in an RDD, or

val result = rdd.flatMap(_._2).distinct.collect

if you want the result in a local collection.

Upvotes: 5

Related Questions