beta alpha
beta alpha

Reputation: 23

Access mutable property of immutable broadcast variable

I'm making Spark App but get stuck on broadcast variable. According to document, broadcast variable should be 'read only'. What if it's properties are mutable?

In local, it works like variable. I don't have cluster environment, so ...

case object Var {
   private var a = 1
   def get() = {
       a = a + 1
       a
   }
}

val b = sc.broadcast(Var)

// usage 
b.value.get   // => 2
b.value.get   // => 3
// ...

Is this wrong usage of broadcast? It seems destroy the broadcast variable's consistency.

Upvotes: 0

Views: 170

Answers (1)

ollik1
ollik1

Reputation: 4540

Broadcasts are moved from the driver JVM to executor JVMs once per executor. What happens is Var would get serialized on driver with its current a, then copied and deserialized to all executor JVMs. Let's say get was never called on driver before broadcasting. Now all executors get a copy of Var with a = 1 and whenever they call get, the value of a in their local JVM gets increased by one. That's it, nothing else happens and the changes of a won't get propagated to any other executor or the driver and the copies of Var will be out of sync.

Is this wrong usage of broadcast?

Well, the interesting question is why would you do that as only the initial value of a gets transferred. If the aim is to build local counters with a common initial value it technically works but there are much cleaner ways to implement that. If the aim is to get the value changes back to the driver then yes, it is wrong usage and accumulators should be used instead.

It seems destroy the broadcast variable's consistency.

Yep, definitely as explained earlier.

Upvotes: 1

Related Questions