Reduce rdd of maps

Question

I have and rdd like that :

Map(A -> Map(A1 -> 1))
Map(A -> Map(A2 -> 2))
Map(A -> Map(A3 -> 3))
Map(B -> Map(B1 -> 4))
Map(B -> Map(B2 -> 5))
Map(B -> Map(B3 -> 6))
Map(C -> Map(C1 -> 7))
Map(C -> Map(C2 -> 8))
Map(C -> Map(C3 -> 9))

I need to have the same rdd reduced by key and having as many values as it has previously:

Map(A -> Map(A1 -> 1, A2 -> 2, A3 -> 3))
Map(B -> Map(B1 -> 4, B2 -> 5, B3 -> 6))
Map(C -> Map(C1 -> 7, C2 -> 8, C3 -> 9))

I tried with a reduce:

val prueba = replacements_2.reduce((x,y) => x ++ y)

But only remains the value of the last element evaluated with the same key:

(A,Map(A3 -> 3))
(C,Map(C3 -> 9))
(B,Map(B3 -> 6))

Raphael Roth · Accepted Answer

I think you should model your data differently, your Map approach seems a bit awkward. Why represent 1 entry by a Map with 1 element? A Tuple2 is more suitable for this... Anyway, you need reduceByKey. To do this, you first need to convert your rdd to a key-value RDD:

rdd
  .map(m => (m.keys.head,m.values.head)) // create key-value RDD
  .reduceByKey((a,b) => a++b) // merge maps
  .map{case (k,v) => Map(k -> v)} // create Map again

Reduce rdd of maps

Answers (1)

Related Questions