Reputation: 45
After some reasearch, I found that the Google cloud API says you should use Pipeline.create(PipelineOptions)
rather than DataflowPipeline.create(DataflowPipelineOptions)
, but it doesn't explain why. Could anyone clarify this for me?
Also, a follow-up question, using Pipeline.create(DataflowPipelineOptions)
also works when running a pipeline, is there any good reason not to rather than reimplementing PipelineOptions
with attributes DataflowPipelineOptions
already has, such as project?
Upvotes: 1
Views: 878
Reputation: 17913
PipelineOptions
is a special class designed to hold a collection of options of many kinds at the same time. DataflowPipelineOptions
is only one of the subsets of options it can hold, but when referring to the full collections of options, it makes more sense to refer to it as PipelineOptions
because it is a more general and abstract concept, even though it is the same object as the DataflowPipelineOptions
.
PipelineOptions
is not even Dataflow-specific; partially because pipelines can be run using runners other than Dataflow, such as Spark and Flink, which have their own options. Hopefully this answers your second question.
Please see Specifying execution parameters for details.
Upvotes: 2