Steve Scott
Steve Scott

Reputation: 1522

Prevent Airflow from triggering on scheduler restart

My Airflow Scheduler went down for some reason, and when I re-started it, all the DAGS triggered simultaneously. It was as if it was catching up from the missed jobs. Also, it seems when I modify a DAG, the workflow triggers. These unexpected triggers corrupt my data and loses trust in the system.

Is there a way to prevent a DAG running unexpectedly unless it is the exact time (no catch-up) or unless it is manually triggered?

Upvotes: 0

Views: 1524

Answers (1)

Tyler
Tyler

Reputation: 750

The airflow scheduler will, at a minimum, attempt to run the current schedule interval when it is online to do so. This means that if the scheduler process is offline for a period of time, when it comes back online it will reconcile which jobs should have run and attempt to start those jobs.

There is some control using catchup, which tells the scheduler that only the latest job should be run and schedule intervals other than the latest that were missed do not need to be run.

Some info on catchup here: https://airflow.apache.org/docs/apache-airflow/stable/dag-run.html#catchup

Is there a way to prevent a DAG running unexpectedly unless it is the exact time (no catch-up) or unless it is manually triggered?

There is no way to tell Airflow to only attempt to schedule the job at the exact time the job is supposed to run (and never attempt again after the fact). You can set the schedule interval to None and the job will never be scheduled, however. You can manually trigger the job through the UI or via the Airflow API in this case.

https://airflow.apache.org/docs/apache-airflow/stable/dag-run.html#cron-presets

preset | meaning
-------+----------------------------------------------------------------
None   | Don’t schedule, use for exclusively “externally triggered” DAGs
    

Upvotes: 3

Related Questions