Reputation: 516
I have a list of key-value pairs in a spark streaming context. How can I aggregate using reduce by the key in case of a list of key-value pairs? Example:
[("key1",2),("key2",3)]
[("key1",4),("key3",2)]
[("key2",4),("key3",2)]
The expected aggregated output:
("key1", 6)
("key2", 7)
("key3", 4)
Upvotes: 0
Views: 896
Reputation: 215117
Flatten it first with flatMap
then reduceByKey
:
val rdd = sc.parallelize(Seq(Seq(("key1",2),("key2",3)), Seq(("key1",4),("key3",2)), Seq(("key2",4),("key3",2))))
rdd.flatMap(identity).reduceByKey(_+_).collect
// res2: Array[(String, Int)] = Array((key1,6), (key2,7), (key3,4))
Upvotes: 2