Liu Chong
Liu Chong

Reputation: 343

How to Get the Largest Key of each Spark Partition?

If we use .reduce(max) then we will get the largest key in the whole RDD. I know this reduce will operate on all partitions and then reduce those items sent by each partition. But how can we get back the largest key of every partition? Write a function for .mapPartitions()?

Upvotes: 0

Views: 958

Answers (1)

user6022341
user6022341

Reputation:

You can:

rdd.mapParitions(iter => Iterator(iter.reduce(Math.max)))

or

rdd.mapPartitions(lambda iter: [max(iter)])

In streaming use this with DStream.trasform.

Upvotes: 2

Related Questions