Reputation: 5389
I have a Spark RDD of this datatype: RDD[(Int, Array[Int])])
Sample values of that RDD are:
100, Array(1,2,3,4,5)
200,Array(1,2,50,20)
300, Array(30,2,400,1)
I would like to get all the unique values among all the Array elements of this RDD I don't care about the key, just want to get all the unique values. So the result from the above sample is (1,2,3,4,5,20,30,50,400).
What will be an efficient way to do that.
Upvotes: 1
Views: 5133
Reputation: 1918
I think this should probably work:
val result = rdd.flatMap(_._2).distinct
if you want the result in an RDD
, or
val result = rdd.flatMap(_._2).distinct.collect
if you want the result in a local collection.
Upvotes: 5