Guo
Guo

Reputation: 1803

how to get max value in spark rdd and remove it?

there is a RDD object:

//have some data in RDD[(Int, Int)] object
(1, 2)
(3, 2)
(2, 3)
(5, 4)
(2, 7)
(5, 2)
(5, 7)

I want to get max key and remove it, the max key is 5, so the result I want is:

//a new RDD object,RDD[(Int, Int)]
(1, 2)
(3, 2)
(2, 3)
(2, 7)

Could you help me? Thank you!

Upvotes: 1

Views: 4028

Answers (1)

Sumit
Sumit

Reputation: 1420

You need to first get the results sorted and then use RDD.max() to get the highest value and finally perform filter to filter the keys which are other than the highest key.

or

You can also register this as DataFrame and execute simple SQL query to get the results.

Upvotes: 1

Related Questions