Reputation: 703
i would like to sort K/V pairs by values and then take the biggest five values. I managed to do this with reverting K/V with first map, sort in descending order with FALSE, and then reverse key.value to the original (second map) and then take the first 5 that are the bigget, the code is this:
RDD.map(lambda x:(x[1],x[0])).sortByKey(False).map(lambda x:(x[1],x[0])).take(5)
i know there is a takeOrdered action on pySpark, but i only managed to sort on values (and not on key), i don't know how to get a descending sorting:
RDD.takeOrdered(5,key = lambda x: x[1])
Upvotes: 28
Views: 45181
Reputation: 866
Sort by keys (ascending):
RDD.takeOrdered(5, key = lambda x: x[0])
Sort by keys (descending):
RDD.takeOrdered(5, key = lambda x: -x[0])
Sort by values (ascending):
RDD.takeOrdered(5, key = lambda x: x[1])
Sort by values (descending):
RDD.takeOrdered(5, key = lambda x: -x[1])
Upvotes: 79