theShadow89
theShadow89

Reputation: 1549

Using Params Inside a PTransform Apache Beam

I have implemented a PTransform:

public class MyTransform extends PTransform<PCollection<I>, PCollection<T>> {

    private Object obj;


    public MyTransform(Object obj) {
        this.obj = obj;
    }

    @Override
    public PCollection<O> expand(PCollection<I> input) {

        return input.apply(ParDo.of(new DoFn<I, O>() {
            @ProcessElement
            public void processElement(ProcessContext c) {
                I in = c.element();
                c.output(obj.m1(in));
            }
        }));
    }
}

I need to use Object parameter inside DoFn. The Object parameter, used inside MyTransform, is serialized multiple times to DoFn function or only when it is created?

The Beam Documentation not cover the case with parameters

Upvotes: 0

Views: 1389

Answers (2)

Andrew Nguonly
Andrew Nguonly

Reputation: 2621

Just for clarification of terms, obj is never actually "passed" to the DoFn class. obj is initialized as a private member of the MyTransform class and is accessible inside the scope of the expand method. MyTransform, and therefore obj, are initialized once for each time MyTransform is serialized for execution.

So to answer your question, obj is "passed" to the DoFn class only when MyTransform is created. But again, this is not the correct terminology.

Upvotes: 0

Jiayuan Ma
Jiayuan Ma

Reputation: 1901

DoFn you created is an anonymous inner class which has an implicit pointer to the enclosing class MyTransform. Your DoFn will follow that pointer and access obj just like a normal Java member field.

This is standard Java, which is not specific to Beam. Also, your Object needs to be serializable because when Beam is trying to serialize DoFn it will follow that pointer and try to serialize MyTransform and all its fields. You can read more about this here.

Upvotes: 1

Related Questions