ericVE
ericVE

Reputation: 41

Beam DoFn static variable shared across JVM

So, I’m trying to figure out the behaviour of static variables in Beam DoFn, is it shared between threads (within the same JVM)?

Basically trying to understand the following from the programming guide:

4.3.2. Thread-compatibility
… Note that static members in your function object are not passed to worker instances and that multiple instances of your function may be accessed from different threads.

https://beam.apache.org/documentation/programming-guide/#requirements-for-writing-user-code-for-beam-transforms

Now, it its seems that the following static object "counter" got initialized, serialized and applied in the worker (Flink engine), is it aligned with the above statement?

If worker threads falls in different processes/JVM’s, obviously will not be shared. But if falling to same JVM will "counter" be shared ?

public class myTransform extends DoFn<KV<String >,String> implements Serializable {
    private static AtomicLong counter = new AtomicLong(0);
         ...
         @ProcessElement
         public void processElement(ProcessContext c) {
             ...
             counter.incrementAndGet();
         }
}

thanks

Upvotes: 2

Views: 1051

Answers (1)

bottaio
bottaio

Reputation: 5093

I think that initialization part refers to e.g. setting some value in DoFn's constructor or something. Your code is going to be initialized as Worker has to load myTransform class.

If they happen to run in the same JVM then yeah, this is going to be shared. What Beam people tried to convey is that you shouldn't base logic on that anyways and parallel instance of an operator might be executed at any node.

Upvotes: 1

Related Questions