Reputation: 37
I created an RDD of key/values this way:
RDD[(String, Int)]: rdd.map(row => row.split(1) -> 1).reduceByKey(_ + _)
How can I get the top five elements based on values?
Upvotes: 1
Views: 842
Reputation: 61666
You can use rdd.top
in order to avoid a full sort of the rdd:
rdd.top(5)(Ordering[Int].on(_._2))
This defines an order on the values and makes a single O(n)
pass on the rdd to get the 5 top items per value.
Upvotes: 2