oikonomiyaki
oikonomiyaki

Reputation: 7951

Deploying Dataflow job that runs for X hours

We are deploying/triggering Dataflow streaming jobs through Airflow using flex template. We want these streaming jobs to run, say for 24 hours (or until a certain clock time), then stop/cancel on its own. Is there a parameter in Dataflow (pipeline setting like max workers) that will do this?

Upvotes: 0

Views: 141

Answers (1)

Mazlum Tosun
Mazlum Tosun

Reputation: 6572

I think there is no parameter and automatic approach to stop or drain a Dataflow job.

You can do that with an Airflow dag. Example you can create a cron dag with Airflow (every 24 hours) having the responsability to stop or drain the Dataflow job, there is a built in operator to do that :

stop_dataflow_job = DataflowStopJobOperator(
    task_id="stop-dataflow-job",
    location="europe-west3",
    job_name_prefix="start-template-job",
)

To stop one or more Dataflow pipelines you can use DataflowStopJobOperator. Streaming pipelines are drained by default, setting drain_pipeline to False will cancel them instead. Provide job_id to stop a specific job, or job_name_prefix to stop all jobs with provided name prefix.

Upvotes: 1

Related Questions