femibyte
femibyte

Reputation: 3507

Where does an accumulator variable in Spark live?

My assumption is that an accumulator is maintained within the Spark Context in the driver program. Unlike broadcast variables, an accumulator's value is not sent to Worker Nodes in the cluster. Is this correct ? If so, what are the mechanics of how it gets updated in the Spark Context ? How are updates to it from the various Executors on the worker nodes implemented ? Is it a Singleton object ?

Upvotes: 0

Views: 1401

Answers (1)

Wellingr
Wellingr

Reputation: 1191

To quote from the spark documentation

Tasks running on the cluster can then add to it using the add method or the += operator (in Scala and Python). However, they cannot read its value. Only the driver program can read the accumulator’s value, using its value method.

Looking at the implementation, it seems that the accumulator keeps its value at the driver side (where it can be read).

As for the executors. I see that the accumulator is registered to the TaskContext upon deserialization. So it seems that the executors keep their own internal accumulator, which is later merged with the real accumulator of the driver.

The accumulator is not a singleton object, since multiple accumulators can be created. However the executors have a means to communicate with the original accumulator of the driver application.

Upvotes: 1

Related Questions