Reputation: 379
RDDData===RET
(
12345,
20170201,
Map(12 -> 85, 15 -> 57, 00 -> 3, 09 -> 80, 21 -> 33, 03 -> 7, 18 -> 50, 06 -> 38, 17 -> 43, 23 -> 28, 11 -> 73, 05 -> 16, 14 -> 58, 08 -> 66, 20 -> 35, 02 -> 9, 01 -> 16, 22 -> 34, 16 -> 49, 19 -> 53, 10 -> 69, 04 -> 15, 13 -> 66, 07 -> 43),
Map(12 -> 4, 15 -> 4, 00 -> 4, 09 -> 4, 21 -> 4, 03 -> 4, 18 -> 4, 06 -> 4, 17 -> 4, 23 -> 4, 11 -> 4, 05 -> 4, 14 -> 4, 08 -> 4, 20 -> 4, 02 -> 4, 01 -> 4, 22 -> 4, 16 -> 4, 19 -> 4, 10 -> 4, 04 -> 4, 13 -> 4, 07 -> 4),
Map(12 -> 15, 15 -> 9, 00 -> 4, 09 -> 14, 21 -> 8, 03 -> 4, 18 -> 8, 06 -> 8, 17 -> 9, 23 -> 8, 11 -> 15, 05 -> 4, 14 -> 9, 08 -> 12, 20 -> 8, 02 -> 4, 01 -> 5, 22 -> 8, 16 -> 9, 19 -> 9, 10 -> 14, 04 -> 5, 13 -> 13, 07 -> 9)
)
I'm new to the spark and I don't know where to start. I have a rdd like mentioned above. Could you please help me to extract the values from the above RDD.
I want to extract the values and join the maps in 3rd, 4th and 5th columns based on key.
Thanks for the help
Upvotes: 0
Views: 979
Reputation: 2108
You can give it a try like this:
rdd.map{case(id, data, map3, map4, map5) =>
map3.toList ++ map4.toList ++ map5.toList
}
.map(l => l.groupBy(_._1).map{case(k, v) => k -> v.map(_._2).toSeq)
First map function will retain only the maps converted to a list that concatenate all maps The second map function will group by key and put all the values in a sequence. So you will have your rdd containing only the maps, joined by key and as values a Seq of numbers (the ones that matches the keys from the maps) The output should be :
(12 -> [85,4,15], 15 -> [57,4,9], 00 -> [3,4,4] .....
Upvotes: 0