Reputation: 2697
I am practicing Apache Spark but encountering the following problem.
val accum = sc.accumulator( 0, "My Accumulator.")
println (accum) // print out: 0
sc.parallelize( Array(1, 2, 3, 4, 5) ).foreach( x => accum += x )
// sc.parallelize( Array(1, 2, 3, 4, 5) ).foreach( x => accum = accum + x )
println( accum.value ) // print out: 15
This line of code sc.parallelize( Array(1, 2, 3, 4, 5) ).foreach( x => accum += x )
is working quite well, but the code commented out below it is not working. The difference is:
x => accum += x
and
x => accum = accum + x
Why the second one is not working?
Upvotes: 5
Views: 527
Reputation: 330453
There are three reasons why it doesn't work:
accum
is a value so it cannot be reassigned Accumulable
class, which is a base class for Accumulator
provides only +=
method, not +
+
method could modify accum
in place, but it would be rather confusing. Upvotes: 6
Reputation: 18042
Because believing it or not, an Accumulator
in Apache Spark
works like a write-only global variable, in our imperative thinking we don't see any difference between x += 1
and x = x + 1
, but there is a slight difference, the second operation in Apache Spark
would require to read the value, but the first one wouldn't, or in an easier (how zero said in his explanation) the method +
isn't implemented for that class. Apache Spark on p. 41, you can read about how it works, the slides are extracted from the Introduction to Big Data with Apache Spark
Upvotes: 4