Reputation: 49
I have a rdd and can´t sort my results from my second key(Number)
rdd = sc.parallelize([(' user3 ', 24),
(' user1 ', 38),
(' dhg ', 22),
(' user2 ', 5),
(' root ', 28),
(' fido ', 1)])
rdd.takeOrdered(5)
Which returns
[(' dhg ', 22), (' fido ', 1), (' root ', 28), (' user1 ', 38), (' user2 ', 5)]
My desired result is:
[('user1', 38), ('root', 28), ('dhg', 22), ('user2', 5), ('fido', 1)]
I mean, ordered by number and descending.
PS: These values were obtained using reduceByKey
to count elements
Any clue on this?
Upvotes: 0
Views: 247
Reputation: 32700
You can pass a custom ordering function to takeOrdered
like this:
values = rdd.takeOrdered(5, lambda x: -x[1])
print(values)
#[(' user1 ', 38), (' root ', 28), (' user3 ', 24), (' dhg ', 22), (' user2 ', 5)]
Upvotes: 1