abhishek jha
abhishek jha

Reputation: 1095

Stop executing a pipeline transform while other pipeline transforms keep running

I have a number of files in google storage which I have to write to multiple tables in BigQuery after applying a simple ParDo transform which I am trying to execute using a single pipeline. So basically I have a number of parallel unconnected sources and sinks running with a single pipeline in one dataflow job. In the Pardo transform, I have a condition which if it evaluates to true, then writing to the particular BigQuery table(transform) has to stop while writing to other BigQuery tables(other transforms) remain as usual. enter image description here

In this image, there are 2 parallel sources and 2 parallel sinks, Because of some bad data in source for date 2014-08-01, the first transform failed. Once the 2014-08-01 transform failed, the 2014-08-02 tranform got cancelled. The 2014-08-02 transform had no bad data.

Is there a way to prevent the cancellation of the other transform?

Upvotes: 2

Views: 221

Answers (1)

danielm
danielm

Reputation: 3010

Currently in the Dataflow service, an entire pipeline will either succeed or fail, and any failure will cancel the rest of the pipeline. There's no way to change this behavior; you need to run separate pipelines if you want to have them succeed or fail separately.

Note that operationally, you can run both pipelines from the same Java main program; just create two different Pipeline objects and invoke run() on them separately.

Upvotes: 2

Related Questions