Etienne
Etienne

Reputation: 45

DataflowPipeline and DataflowPipelineOptions vs. Pipeline and PipelineOptions

After some reasearch, I found that the Google cloud API says you should use Pipeline.create(PipelineOptions) rather than DataflowPipeline.create(DataflowPipelineOptions), but it doesn't explain why. Could anyone clarify this for me?

Also, a follow-up question, using Pipeline.create(DataflowPipelineOptions) also works when running a pipeline, is there any good reason not to rather than reimplementing PipelineOptions with attributes DataflowPipelineOptions already has, such as project?

Upvotes: 1

Views: 878

Answers (1)

jkff
jkff

Reputation: 17913

PipelineOptions is a special class designed to hold a collection of options of many kinds at the same time. DataflowPipelineOptions is only one of the subsets of options it can hold, but when referring to the full collections of options, it makes more sense to refer to it as PipelineOptions because it is a more general and abstract concept, even though it is the same object as the DataflowPipelineOptions.

PipelineOptions is not even Dataflow-specific; partially because pipelines can be run using runners other than Dataflow, such as Spark and Flink, which have their own options. Hopefully this answers your second question.

Please see Specifying execution parameters for details.

Upvotes: 2

Related Questions