Reputation: 129
I've set up a dag with the following parameters
local_tz = pendulum.timezone('US/Eastern')
default_args = {
'retries': 3,
'retry_delay': timedelta(minutes=5)
}
dag = DAG(
dag_id='some_dag',
start_date=datetime(2021, 1, 8, tzinfo=local_tz),
schedule_interval='0 16 8 * *',
default_args=default_args,
catchup=True
)
I am expecting the most recent task run to be on May 8th, however, I only see February 8th, March 8th, and April 8th. I can't seem to figure out why Airflow stops in April.
It is currently May 25th so shouldn't the May 8th dag run have backfilled along with the other months? To be clear, I have just deployed this dag today, so all of the executed dag runs including the missing May 8th are backfills.
Upvotes: 1
Views: 603
Reputation: 15931
This is expected. As you mentioned Airflow schedule tasks at the end of interval. According to your setup the scheduling will look like:
The 1st
run will start on 2021-02-08
this run execution_date
will be 2021-01-08
The 2nd
run will start on 2021-03-08
this run execution_date
will be 2021-02-08
The 3th
run will start on 2021-04-08
this run execution_date
will be 2021-03-08
The 4th
run will start on 2021-05-08
this run execution_date
will be 2021-04-08
The 5th
run will start on 2021-06-08
this run execution_date
will be 2021-05-08
Since you actually set the DAG to start on 2021-05-26
Airflow executed at that moment 1st-4th
runs because the interval has ended for these runs. The 5th
run did not start yet because the interval has not endded yet it will end on 2021-06-08
.
You can read more extensive explanation about why Airflow behaves like that in this answer.
Upvotes: 1