Reputation: 343
If we use .reduce(max)
then we will get the largest key in the whole RDD. I know this reduce will operate on all partitions and then reduce those items sent by each partition. But how can we get back the largest key of every partition? Write a function for .mapPartitions()
?
Upvotes: 0
Views: 958
Reputation:
You can:
rdd.mapParitions(iter => Iterator(iter.reduce(Math.max)))
or
rdd.mapPartitions(lambda iter: [max(iter)])
In streaming use this with DStream.trasform
.
Upvotes: 2