Reputation: 3033
var myMap:Map[String, Int] = Map()
myRDD.foreach { data =>
println( "1. " + data.name + " : " + data.time)
myMap += ( data.name -> data.time)
println( "2. " + myMap)
}
println( "Total Map : " + myMap)
Result
- A : 1
- Map(A -> 1)
- B: 2
- Map(B -> 2) // deleted key A
- C: 3
- Map(C -> 3) // deleted Key A and B
Total Map : Map() // nothing
Somehow I cannot store Map data in foreach. It kept deleting or initialing previous data when adding new key&value. Any Idea of this?
Upvotes: 1
Views: 55
Reputation: 37435
Spark closures are serialized and executed in a separate context (remotely when in a cluster). myMap
variable will not be updated locally.
To get the data from the RDD as a map, there's a built-in operation:
val myMap = rdd.collectAsMap()
Upvotes: 1