Reputation: 1803
there is a RDD object:
//have some data in RDD[(Int, Int)] object
(1, 2)
(3, 2)
(2, 3)
(5, 4)
(2, 7)
(5, 2)
(5, 7)
I want to get max key and remove it, the max key is 5, so the result I want is:
//a new RDD object,RDD[(Int, Int)]
(1, 2)
(3, 2)
(2, 3)
(2, 7)
Could you help me? Thank you!
Upvotes: 1
Views: 4028
Reputation: 1420
You need to first get the results sorted and then use RDD.max()
to get the highest value and finally perform filter
to filter the keys which are other than the highest key.
or
You can also register this as DataFrame
and execute simple SQL query to get the results.
Upvotes: 1