Reputation: 1522
My Airflow Scheduler went down for some reason, and when I re-started it, all the DAGS triggered simultaneously. It was as if it was catching up from the missed jobs. Also, it seems when I modify a DAG, the workflow triggers. These unexpected triggers corrupt my data and loses trust in the system.
Is there a way to prevent a DAG running unexpectedly unless it is the exact time (no catch-up) or unless it is manually triggered?
Upvotes: 0
Views: 1524
Reputation: 750
The airflow scheduler will, at a minimum, attempt to run the current schedule interval when it is online to do so. This means that if the scheduler process is offline for a period of time, when it comes back online it will reconcile which jobs should have run and attempt to start those jobs.
There is some control using catchup
, which tells the scheduler that only the latest job should be run and schedule intervals other than the latest that were missed do not need to be run.
Some info on catchup
here: https://airflow.apache.org/docs/apache-airflow/stable/dag-run.html#catchup
Is there a way to prevent a DAG running unexpectedly unless it is the exact time (no catch-up) or unless it is manually triggered?
There is no way to tell Airflow to only attempt to schedule the job at the exact time the job is supposed to run (and never attempt again after the fact). You can set the schedule interval to None
and the job will never be scheduled, however. You can manually trigger the job through the UI or via the Airflow API in this case.
https://airflow.apache.org/docs/apache-airflow/stable/dag-run.html#cron-presets
preset | meaning
-------+----------------------------------------------------------------
None | Don’t schedule, use for exclusively “externally triggered” DAGs
Upvotes: 3