Reputation: 5582
I have the result of map that looks like this
[ ('success', '', 1), ('success', '', 1), ('error', 'something_random', 1), ('error','something_random', 1), ('error', 'something_random', 1) ]
Is there a way with a reduce by key to endup as:
[ ('success', 2), ('error', 3) ]
and then somehow print on a file all the errors ?
Upvotes: 0
Views: 1630
Reputation: 214957
Here are two options to get the result you need:
1) convert the 3 element tuple to 2 element tuple then use reduceByKey
:
rdd.map(lambda x: (x[0], x[2])).reduceByKey(lambda x, y: x + y).collect()
# [('success', 2), ('error', 3)]
2) groupBy
the first element of tuple, then sum up the values (third element) for each group using mapValues
:
rdd.groupBy(lambda x: x[0]).mapValues(lambda g: sum(x for _,_,x in g)).collect()
# [('success', 2), ('error', 3)]
Upvotes: 6