Ysak
Ysak

Reputation: 33

Spark(scala): Count all distinct values of a whole column on RDD

I have this RDD:

val resultRdd: RDD[(VertexId, String, Seq[Long])]

I want to count the distinct values in Seq of all records.

for example, if I have 3 records with Seq values as follows:

VertexId ------- String -------Seq[Long]
1 ----------------- x -------------  1, 3
2 ----------------- x -------------  1, 5
3 ----------------- x--------------- 2, 3, 6

the result should be = 5 , the count of {1,3,5,2,6}

Thanks :)

Upvotes: 1

Views: 3077

Answers (1)

Tzach Zohar
Tzach Zohar

Reputation: 37822

resultRdd.flatMap(_._3).distinct().count()

Upvotes: 6

Related Questions