Reputation: 516

Spark - Reduce List of key-value pair in Scala

I have a list of key-value pairs in a spark streaming context. How can I aggregate using reduce by the key in case of a list of key-value pairs? Example:

[("key1",2),("key2",3)]
[("key1",4),("key3",2)]
[("key2",4),("key3",2)]

The expected aggregated output:

("key1", 6)
("key2", 7)
("key3", 4)

Upvotes: 0

Answers (1)

akuiper

Reputation: 215117

Flatten it first with flatMap then reduceByKey:

val rdd = sc.parallelize(Seq(Seq(("key1",2),("key2",3)), Seq(("key1",4),("key3",2)), Seq(("key2",4),("key3",2))))

rdd.flatMap(identity).reduceByKey(_+_).collect
// res2: Array[(String, Int)] = Array((key1,6), (key2,7), (key3,4))

Upvotes: 2

Spark - Reduce List of key-value pair in Scala

Answers (1)

Related Questions