Spark : Difference between accumulator and local variable

Question

While exploring Spark accumulators, I tried to understand and showcase the difference between the accumulator and regular variable in Spark. But output does not seem to match my expectation. I mean both the accumulator and counter have the same value at the end of program and am able read accumulator within transformation function (as per docs only driver can read accumulator). Am i doing something wrong? Is my understanding correct?

object Accmulators extends App {

  val spark = SparkSession.builder().appName("Accmulators").master("local[3]").getOrCreate()

  val cntAccum = spark.sparkContext.longAccumulator("counterAccum")

  val file = spark.sparkContext.textFile("resources/InputWithBlank")

  var counter = 0

  def countBlank(line:String):Array[String]={
    val trimmed = line.trim
    if(trimmed == "") {
      cntAccum.add(1)
      cntAccum.value //reading accumulator
      counter += 1
    }
    return line.split(" ")
  }

  file.flatMap(line => countBlank(line)).collect()

  println(cntAccum.value)

  println(counter)
}

The input file has text with 9 empty lines in between

4) Aggregations and Joins

5) Spark SQL

6) Spark App tuning

Output :

Both counter and cntAccum giving same result.

Spark : Difference between accumulator and local variable

Answers (1)

Related Questions