Reputation: 180
Next airflow DAG run with '00 14 * * 1,2,3,4,5' schedule interval is scheduled for 2022-10-31, 14:00:00, which was 5 hours ago. Is it possible to skip the unexecuted run without actually running the DAG? I also tried: Triggering the DAG manually. Changing the schedule interval to now and running the DAG creates 3 day old log (2022-10-28, 19:29:00). Changing catchup to True doesn't execute any new runs.
DEFAULT_ARGS = {
"owner": Variable.get('airflow_user'),
"start_date": datetime(2022, 10, 31),
"email": Variable.get('airflow_email'),
"email_on_failure": True,
"retries": 0
}
dag = DAG(
dag_id='dag',
schedule_interval='00 14 * * 1,2,3,4,5',
default_args=DEFAULT_ARGS,
catchup=True
)
Upvotes: 3
Views: 5034
Reputation: 621
Take a look at data intervals.
When you create a dag with a start_date=datetime(2022, 10, 31)
and a schedule_interval='00 14 * * 1,2,3,4,5'
this creates a data interval of 2022/10/31 to 2022/11/01 @ 14:00.
Dags always run at the end of their data interval. Airflow was designed as an orchestrator for ETL data processing. The idea being a dag for the 31st should cover data for the 31st (i.e at the end of the day).
Your next dag run isn't set in the past, the dag for 2022/10/31 won't run until tomorrow (it's tomorrow because your cron is set to a daily schedule).
Setting catchup=True
with a start date of 2022/10/31 won't run a dag either, because there is no dag interval for 2022/10/30 to 2022/10/31 to run (because your dag schedule doesn't run on a Sunday).
This dag:
DEFAULT_ARGS = {
"owner": Variable.get('airflow_user'),
"start_date": datetime(2022, 10, 30),
"email": Variable.get('airflow_email'),
"email_on_failure": True,
"retries": 0
}
dag = DAG(
dag_id='dag',
schedule_interval='00 14 * * *',
default_args=DEFAULT_ARGS,
catchup=True
)
catchup=True
the dag run for 2022/10/30 will run immediately.Upvotes: 4