Reputation: 1549
I have implemented a PTransform
:
public class MyTransform extends PTransform<PCollection<I>, PCollection<T>> {
private Object obj;
public MyTransform(Object obj) {
this.obj = obj;
}
@Override
public PCollection<O> expand(PCollection<I> input) {
return input.apply(ParDo.of(new DoFn<I, O>() {
@ProcessElement
public void processElement(ProcessContext c) {
I in = c.element();
c.output(obj.m1(in));
}
}));
}
}
I need to use Object
parameter inside DoFn
. The Object
parameter, used inside MyTransform
, is serialized multiple times to DoFn
function or only when it is created?
The Beam Documentation not cover the case with parameters
Upvotes: 0
Views: 1389
Reputation: 2621
Just for clarification of terms, obj
is never actually "passed" to the DoFn
class. obj
is initialized as a private member of the MyTransform
class and is accessible inside the scope of the expand
method. MyTransform
, and therefore obj
, are initialized once for each time MyTransform
is serialized for execution.
So to answer your question, obj
is "passed" to the DoFn
class only when MyTransform
is created. But again, this is not the correct terminology.
Upvotes: 0
Reputation: 1901
DoFn
you created is an anonymous inner class which has an implicit pointer to the enclosing class MyTransform
. Your DoFn
will follow that pointer and access obj
just like a normal Java member field.
This is standard Java, which is not specific to Beam. Also, your Object
needs to be serializable because when Beam is trying to serialize DoFn
it will follow that pointer and try to serialize MyTransform
and all its fields. You can read more about this here.
Upvotes: 1