fluency03
fluency03

Reputation: 2697

accumulator of Spark is confusing me.

I am practicing Apache Spark but encountering the following problem.

val accum = sc.accumulator( 0, "My Accumulator.")
println (accum)  // print out: 0

sc.parallelize( Array(1, 2, 3, 4, 5) ).foreach( x => accum += x ) 
// sc.parallelize( Array(1, 2, 3, 4, 5) ).foreach( x => accum = accum + x )
println( accum.value ) // print out: 15

This line of code sc.parallelize( Array(1, 2, 3, 4, 5) ).foreach( x => accum += x ) is working quite well, but the code commented out below it is not working. The difference is:

x => accum += x

and

x => accum = accum + x

Why the second one is not working?

Upvotes: 5

Views: 527

Answers (2)

zero323
zero323

Reputation: 330453

There are three reasons why it doesn't work:

  1. accum is a value so it cannot be reassigned
  2. Accumulable class, which is a base class for Accumulator provides only += method, not +
  3. accumulators are write-only from the worker perspective so you cannot read the value inside an action. Theoretically + method could modify accum in place, but it would be rather confusing.

Upvotes: 6

Alberto Bonsanto
Alberto Bonsanto

Reputation: 18042

Because believing it or not, an Accumulator in Apache Spark works like a write-only global variable, in our imperative thinking we don't see any difference between x += 1 and x = x + 1, but there is a slight difference, the second operation in Apache Spark would require to read the value, but the first one wouldn't, or in an easier (how zero said in his explanation) the method + isn't implemented for that class. Apache Spark on p. 41, you can read about how it works, the slides are extracted from the Introduction to Big Data with Apache Spark

Upvotes: 4

Related Questions