Reputation: 7951
We are deploying/triggering Dataflow streaming jobs through Airflow using flex template. We want these streaming jobs to run, say for 24 hours (or until a certain clock time), then stop/cancel on its own. Is there a parameter in Dataflow (pipeline setting like max workers) that will do this?
Upvotes: 0
Views: 141
Reputation: 6572
I think there is no parameter and automatic approach to stop or drain a Dataflow
job.
You can do that with an Airflow
dag.
Example you can create a cron
dag with Airflow
(every 24 hours) having the responsability to stop or drain the Dataflow
job, there is a built in operator to do that :
stop_dataflow_job = DataflowStopJobOperator(
task_id="stop-dataflow-job",
location="europe-west3",
job_name_prefix="start-template-job",
)
To stop one or more Dataflow pipelines you can use DataflowStopJobOperator. Streaming pipelines are drained by default, setting drain_pipeline to False will cancel them instead. Provide job_id to stop a specific job, or job_name_prefix to stop all jobs with provided name prefix.
Upvotes: 1