arj
arj

Reputation: 703

takeOrdered descending Pyspark

i would like to sort K/V pairs by values and then take the biggest five values. I managed to do this with reverting K/V with first map, sort in descending order with FALSE, and then reverse key.value to the original (second map) and then take the first 5 that are the bigget, the code is this:

RDD.map(lambda x:(x[1],x[0])).sortByKey(False).map(lambda x:(x[1],x[0])).take(5)

i know there is a takeOrdered action on pySpark, but i only managed to sort on values (and not on key), i don't know how to get a descending sorting:

RDD.takeOrdered(5,key = lambda x: x[1])

Upvotes: 28

Views: 45181

Answers (1)

aatishk
aatishk

Reputation: 866

Sort by keys (ascending):

RDD.takeOrdered(5, key = lambda x: x[0])

Sort by keys (descending):

RDD.takeOrdered(5, key = lambda x: -x[0])

Sort by values (ascending):

RDD.takeOrdered(5, key = lambda x: x[1])

Sort by values (descending):

RDD.takeOrdered(5, key = lambda x: -x[1])

Upvotes: 79

Related Questions