Meio
Meio

Reputation: 180

Airflow DAG next run is stuck in past

Next airflow DAG run with '00 14 * * 1,2,3,4,5' schedule interval is scheduled for 2022-10-31, 14:00:00, which was 5 hours ago. Is it possible to skip the unexecuted run without actually running the DAG? I also tried: Triggering the DAG manually. Changing the schedule interval to now and running the DAG creates 3 day old log (2022-10-28, 19:29:00). Changing catchup to True doesn't execute any new runs.

DEFAULT_ARGS = {
    "owner": Variable.get('airflow_user'),
    "start_date": datetime(2022, 10, 31),
    "email": Variable.get('airflow_email'),
    "email_on_failure": True,
    "retries": 0
}

dag = DAG(
    dag_id='dag',
    schedule_interval='00 14 * * 1,2,3,4,5',
    default_args=DEFAULT_ARGS,
    catchup=True
)

Upvotes: 3

Views: 5034

Answers (1)

Daniel T
Daniel T

Reputation: 621

Take a look at data intervals.

When you create a dag with a start_date=datetime(2022, 10, 31) and a schedule_interval='00 14 * * 1,2,3,4,5' this creates a data interval of 2022/10/31 to 2022/11/01 @ 14:00.

Dags always run at the end of their data interval. Airflow was designed as an orchestrator for ETL data processing. The idea being a dag for the 31st should cover data for the 31st (i.e at the end of the day).

Your next dag run isn't set in the past, the dag for 2022/10/31 won't run until tomorrow (it's tomorrow because your cron is set to a daily schedule).

Setting catchup=True with a start date of 2022/10/31 won't run a dag either, because there is no dag interval for 2022/10/30 to 2022/10/31 to run (because your dag schedule doesn't run on a Sunday).

This dag:

DEFAULT_ARGS = {
    "owner": Variable.get('airflow_user'),
    "start_date": datetime(2022, 10, 30),
    "email": Variable.get('airflow_email'),
    "email_on_failure": True,
    "retries": 0
}

dag = DAG(
    dag_id='dag',
    schedule_interval='00 14 * * *',
    default_args=DEFAULT_ARGS,
    catchup=True
)
  • Defines a data interval for 2022/10/30 to 2022/10/31 @ 14:00
  • Missed its dag run for 2022/10/30
  • As catchup=True the dag run for 2022/10/30 will run immediately.
  • Creates a new data interval for 2022/10/31 to 2022/11/01 @ 14:00.
  • The dag for 2022/10/31 will run on 2022/11/01 @ 14:00.

Upvotes: 4

Related Questions