loner_code_monkey
loner_code_monkey

Reputation: 5

PySpark Reduce on RDD with only single element

Is there anyway to deal with RDDs with only a single element (this can sometimes happen for what I am doing)? When that's the case, reduce stops working as the operation requires 2 inputs.

I am working with key-value pairs such as:

(key1, 10),
(key2, 20),

And I want to aggregate their values, so the result should be:

30

But there are cases where the rdd only contain a single key-value pair, so reduce does not work here, example:

(key1, 10)

This will return nothing.

Upvotes: 0

Views: 525

Answers (1)

mck
mck

Reputation: 42392

If you do a .values() before doing reduce, it should work even if there is only 1 element in the RDD:

from operator import add

rdd = sc.parallelize([('key1', 10),])

rdd.values().reduce(add)
# 10

Upvotes: 0

Related Questions