Jonathan Duran
Jonathan Duran

Reputation: 129

Why is airflow not running the most recent task

I've set up a dag with the following parameters

local_tz = pendulum.timezone('US/Eastern')  

default_args = {
    'retries': 3,
    'retry_delay': timedelta(minutes=5)
}

dag = DAG(
    dag_id='some_dag',
    start_date=datetime(2021, 1, 8, tzinfo=local_tz),
    schedule_interval='0 16 8 * *',
    default_args=default_args,
    catchup=True
)

I am expecting the most recent task run to be on May 8th, however, I only see February 8th, March 8th, and April 8th. I can't seem to figure out why Airflow stops in April.

enter image description here

It is currently May 25th so shouldn't the May 8th dag run have backfilled along with the other months? To be clear, I have just deployed this dag today, so all of the executed dag runs including the missing May 8th are backfills.

Upvotes: 1

Views: 603

Answers (1)

Elad Kalif
Elad Kalif

Reputation: 15931

This is expected. As you mentioned Airflow schedule tasks at the end of interval. According to your setup the scheduling will look like:

The 1st run will start on 2021-02-08 this run execution_date will be 2021-01-08

The 2nd run will start on 2021-03-08 this run execution_date will be 2021-02-08

The 3th run will start on 2021-04-08 this run execution_date will be 2021-03-08

The 4th run will start on 2021-05-08 this run execution_date will be 2021-04-08

The 5th run will start on 2021-06-08 this run execution_date will be 2021-05-08

Since you actually set the DAG to start on 2021-05-26 Airflow executed at that moment 1st-4th runs because the interval has ended for these runs. The 5th run did not start yet because the interval has not endded yet it will end on 2021-06-08.

You can read more extensive explanation about why Airflow behaves like that in this answer.

Upvotes: 1

Related Questions