Ranit Dholey
Ranit Dholey

Reputation: 37

How to get the n top elements of an rdd per value?

I created an RDD of key/values this way:

RDD[(String, Int)]: rdd.map(row => row.split(1) -> 1).reduceByKey(_ + _)

How can I get the top five elements based on values?

Upvotes: 1

Views: 842

Answers (1)

Xavier Guihot
Xavier Guihot

Reputation: 61666

You can use rdd.top in order to avoid a full sort of the rdd:

rdd.top(5)(Ordering[Int].on(_._2))

This defines an order on the values and makes a single O(n) pass on the rdd to get the 5 top items per value.

Upvotes: 2

Related Questions