Reputation: 6766
My question is based upon this question. I have a spark pair RDD (key, count): [(a,1), (b,2), (c,1), (d,3)]
.
How can I find the both the key with highest count and the actual count?
Upvotes: 5
Views: 4874
Reputation: 11
val myRDD = sc.parallelize(Array(
| | ("a",1),
| | ("b",5),
| | ("c",1),
| | ("d",3))).sortBy(_._2,false).take(1)
Sorting on the values in descending order and taking topmost element.
Upvotes: 1
Reputation: 4771
(sc
.parallelize([("a",1), ("b",5), ("c",1), ("d",3)])
.max(key=lambda x:x[1]))
does return ('b', 5)
, not only 5
. The first parameter of max
is the key used for comparison (explicited here), but max still returns the whole value, here the complete tuple.
Upvotes: 6