Reputation: 319
I have the following DAG set up:
default_dag_args = {
'start_date': datetime.datetime(2021, 6, 25, 0, 0),
'email': '[email protected]',
'email_on_failure': True,
'email_on_retry': False,
'retries': 1,
'retry_delay': datetime.timedelta(minutes=30)
}
with models.DAG(
'foobar',
schedule_interval = "30 5,7,9 * * *",
default_args=default_dag_args,
catchup=False) as dag:
The behaviour that I want to have is that the DAG will execute at 5:30, 7:30 and 9:30 UTC every day. The behaviour that I'm seeing is that the 5:30 run executes at 7:30 UTC, the 7:30 run executes at 9:30 and the 9:30 run executes at 5:30 the next day.
I think I kind of have a vague idea of why this is happening based on the docs - 9:30 marks the end of the schedule period and so the 9:30 run executes at the beginning of the next period. I can't figure out how to get the behaviour I want though. The DAG doesn't have any reference to the schedule time in the code, it just needs to run at 5:30, 7:30 and 9:30 and the 'run time' as Airflow considers it doesn't matter.
Is there any way to get a DAG to run at absolute times? If not, what schedule can I set to get the behaviour I desire?
Upvotes: 2
Views: 4178
Reputation: 15931
Airflow is not a cron job scheduler. Airflow calculates start_date + schedule_interval
and execute the job at the end of the interval. The reason behind this is explained in this answer.
In your case:
start_date=datetime(2021,06,25)
with schedule_interval = "30 5,7,9 * * *"
gives:
1st tasks with execution_date 2021-06-25 5:30
will start running on 2021-06-25 7:30
2nd task with execution_date 2021-06-25 7:30
will start running on 2021-06-25 9:30
3rd task with execution_date 2021-06-25 9:30
will start running on 2021-06-26 5:30
4th task with execution_date 2021-06-26 5:30
will start running on 2021-06-26 7:30
5th task with execution_date 2021-06-26 7:30
will start running on 2021-06-26 9:30
6th task with execution_date 2021-06-26 9:30
will start running on 2021-06-27 5:30
7th task with execution_date 2021-06-27 5:30
will start running on 2021-06-27 7:30
8th task with execution_date 2021-06-27 7:30
will start running on 2021-06-27 9:30
9th task with execution_date 2021-06-27 9:30
will start running on 2021-06-28 5:30
and so on...
Note that you still get 3 runs per day (except the first date) as you expect it just a matter of understanding how scheduling works. If you want to get 3 runs on the first date as well then change your start_date
to datetime(2021,06,24,9,30)
. The execution_date is a logical date. If needed you can reference relevant dates within your DAG code usings macros - for example:
I mentioned that the 6th run execution_date
is 2021-06-26 9:30
using macros with that runs can give you:
prev_execution_date
is 2021-06-26 7:30
next_execution_date
is 2021-06-27 5:30
Note: your code has catchup=False
so the exact dates I wrote here won't be the same but that effects only on the first run. The following runs will follow the same logic.
Upvotes: 1