Reputation: 3069
In Spark program ,I WANT To define a variable like immutable map which will be accessed by all worker programs synchrononously, what can I do ? Should I define an scala object?
Not only immutable map , what if I want a variable that can be shared and can be updated synchronously? For example , a 'mutable map' , a 'var Int' or 'var String' or some others?What can I do? Is an scala object variable OK?For example :
Object SparkObj{
var x:Int
var y:String
}
Is x and y have only one copy instead of several copies?
Is the update to x and y synchronous?
Upvotes: 2
Views: 4611
Reputation: 27455
If you refer to a variable inside a closure that runs on the workers, it will be captured, serialized and sent to the workers. For example:
val i = 5
rdd.map(_ + i) // "i" is sent to the workers, they add 5 to each element.
Nothing is sent back from the workers, however. If you add something to a mutable.Seq
inside a worker, the change will not be visible from anywhere. You'll be modifying an object that is discarded after the closure is executed.
Apache Spark provides a number of primitives for performing distributed computing. Synchronized mutable state is not one of these.
Upvotes: 3