Reputation: 8858
I have an Airflow DAG that previously did not have a schedule. It is turned on in the Airflow interface and I was running it manually. Just now, I updated the schedule as follows:
[in the default arguments object that is fed into the DAG]
'catchup': True
'start_date': datetime.datetime(2020, 8, 1)
[in my DAG object instantiation]
schedule_interval='0 0 17 * *'
Right now it is August 18, 2020 in UTC. I expected that this would cause the DAG to immediately run once the code changes were added, but so far it isn't running.
I have said that the schedule starts on August 1 2020, and the schedule interval means "every month on the 17th," so by that definition it has missed a run at midnight on August 17th. Why isn't it catching up that past run, since catchup
is set to True
? When will it first run?
I know that there is some controversy surrounding the confusing behavior of schedule_interval
, because the first run is after the first interval after start_date. However, even those discussions, which I have read, have to do with the case when schedule_interval
is an actual interval like @daily
, and when somebody has placed the first run in the future. I cannot find any documentation for what should happen for catchup when the new schedule starts in the past and/or when schedule_interval
is a cron.
Upvotes: 1
Views: 653
Reputation: 4366
You want to set your start_date
to be one "interval" of your schedule_interval
behind the current date/time; where "interval" is the amount of time between subsequent executions.
The easiest way to do this is to just set start_date
to be the date of when it would have ran prior to the date you want it to run, if it were already installed and running. In this instance it would be datetime.datetime(2020, 7, 17)
Upvotes: 3