user2543622
user2543622

Reputation: 6766

spark finding max value and the associated key

My question is based upon this question. I have a spark pair RDD (key, count): [(a,1), (b,2), (c,1), (d,3)].

How can I find the both the key with highest count and the actual count?

Upvotes: 5

Views: 4874

Answers (2)

user1501308
user1501308

Reputation: 11

val myRDD = sc.parallelize(Array(
     |      | ("a",1),
     |      | ("b",5),
     |      | ("c",1),
     |      | ("d",3))).sortBy(_._2,false).take(1)

Sorting on the values in descending order and taking topmost element.

Upvotes: 1

Quentin Pradet
Quentin Pradet

Reputation: 4771

(sc
    .parallelize([("a",1), ("b",5), ("c",1), ("d",3)])
    .max(key=lambda x:x[1]))

does return ('b', 5), not only 5. The first parameter of max is the key used for comparison (explicited here), but max still returns the whole value, here the complete tuple.

Upvotes: 6

Related Questions