WsnCode
WsnCode

Reputation: 49

Pyspark takeOrdered by key

I have a rdd and can´t sort my results from my second key(Number)

rdd = sc.parallelize([(' user3 ', 24),
 (' user1 ', 38),
 (' dhg ', 22),
 (' user2 ', 5),
 (' root ', 28),
 (' fido ', 1)])

rdd.takeOrdered(5)

Which returns

[(' dhg ', 22), (' fido ', 1), (' root ', 28), (' user1 ', 38), (' user2 ', 5)]

My desired result is:

[('user1', 38), ('root', 28), ('dhg', 22), ('user2', 5), ('fido', 1)]

I mean, ordered by number and descending.

PS: These values were obtained using reduceByKey to count elements

Any clue on this?

Upvotes: 0

Views: 247

Answers (1)

blackbishop
blackbishop

Reputation: 32700

You can pass a custom ordering function to takeOrdered like this:

values = rdd.takeOrdered(5, lambda x: -x[1])

print(values)
#[(' user1 ', 38), (' root ', 28), (' user3 ', 24), (' dhg ', 22), (' user2 ', 5)]

Upvotes: 1

Related Questions