Adding lists by element in pyspark

Question

I'd like to take a RDD of integer lists and reduce it down to one list. For example...

[1, 2, 3, 4]
[2, 3, 4, 5]

to

[3, 5, 7, 9]

I can do this in python using the zip function but not sure how to replicate it in spark besides doing collect on the object but I want to keep the data in the rdd.

akuiper · Accepted Answer

If all elements in rdd are of the same length, you can use reduce with zip:

rdd = sc.parallelize([[1,2,3,4],[2,3,4,5]])

rdd.reduce(lambda x, y: [i+j for i, j in zip(x, y)])
# [3, 5, 7, 9]

Adding lists by element in pyspark

Answers (1)

Related Questions