Reputation: 311
I'd like to take a RDD of integer lists and reduce it down to one list. For example...
[1, 2, 3, 4]
[2, 3, 4, 5]
to
[3, 5, 7, 9]
I can do this in python using the zip function but not sure how to replicate it in spark besides doing collect on the object but I want to keep the data in the rdd.
Upvotes: 0
Views: 175
Reputation: 215057
If all elements in rdd
are of the same length, you can use reduce
with zip
:
rdd = sc.parallelize([[1,2,3,4],[2,3,4,5]])
rdd.reduce(lambda x, y: [i+j for i, j in zip(x, y)])
# [3, 5, 7, 9]
Upvotes: 1