Reputation: 3270
I need to merge a list into a set from an RDD
, but I got stuck doing it in Scala:
var accounts = set("name" -> "", "id" -> 0, ....)
//Split the RDD into lines and split each line by `|` to get the values
stream.foreachRDD {_.map(_._2).flatMap(_.split("|")).foreach(f => /*merge here ?*/)}
How do I associate the values with my account sets?
For example, assume a RDD loaded from a CSV (I made up this data)
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
...
The RDD has up to 300 columns/fields.
My main objective is to convert it to some json but I need to associate each value to a key by loading it up to map or class.
var election = Map ("firstname" -> "Donald",
"lastname" -> "Trump",
"country" -> "US",
"event" -> "Election",
"period" -> "March"
"var1" -> "Spring",
....
"varN" -> "...")
Upvotes: 2
Views: 206
Reputation: 3270
A bit of clean up to @slouc answer
stream.foreachRDD {_.map(_._2).map(l => (mapKeys zip l.split("\\|")).toMap).saveToEs(conf)}
Upvotes: 0
Reputation: 9698
I'm not sure if I understood correctly, but does this help?
val data = List(
"Donald|Trump|US|Election|March",
"John|Smith|UK|Election|February"
)
val mapKeys = List("firstname", "lastname", "country", "event", "period")
val election = data.map { row =>
(mapKeys zip row.split("\\|").toList).map {
case (key, value) => key -> value
}.toMap
}
So, you will get a list of maps - for each row of your data you get a map of key/value pairs as you described.
Upvotes: 1