Why Spark Accumulator's output type need to be thread safe?

Question

According to AccumulatorV2, the output of it should be a type that can be read atomically (e.g., Int, Long), or thread-safely (e.g., synchronized collections) because it will be read from other threads.

Let's say I have a class called CheckSumAccumulator which extends from AccumulatorV2, CheckSumAccumulator's output type is CheckSum, CheckSumAccumulator has a private field called checkSum; CheckSum has a private field called count and it has public setting and getter methods.

public class CheckSumAccumulator extends AccumulatorV2 {
   private CheckSum checkSum;
   ...
}

public class CheckSum extends Serializable {
   private long count;
   public long getCount() {
     return count;
   }
   
   public void setCount(long count) {
     this.count = count;
   } 
}

What could go wrong? Does Accumulator instance runs in single thread in each Executor?

Why Spark Accumulator's output type need to be thread safe?

Answers (1)

Related Questions

Why Spark Accumulator&#39;s output type need to be thread safe?

Answers (1)

Related Questions

Why Spark Accumulator's output type need to be thread safe?