Duck Hunt Duo
Duck Hunt Duo

Reputation: 319

Airflow scheduling for specific times each day

I have the following DAG set up:

default_dag_args = {
    'start_date': datetime.datetime(2021, 6, 25, 0, 0),
    'email': '[email protected]',
    'email_on_failure': True,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': datetime.timedelta(minutes=30)
}

with models.DAG(
        'foobar',
        schedule_interval = "30 5,7,9 * * *",
        default_args=default_dag_args,
        catchup=False) as dag:

The behaviour that I want to have is that the DAG will execute at 5:30, 7:30 and 9:30 UTC every day. The behaviour that I'm seeing is that the 5:30 run executes at 7:30 UTC, the 7:30 run executes at 9:30 and the 9:30 run executes at 5:30 the next day.

I think I kind of have a vague idea of why this is happening based on the docs - 9:30 marks the end of the schedule period and so the 9:30 run executes at the beginning of the next period. I can't figure out how to get the behaviour I want though. The DAG doesn't have any reference to the schedule time in the code, it just needs to run at 5:30, 7:30 and 9:30 and the 'run time' as Airflow considers it doesn't matter.

Is there any way to get a DAG to run at absolute times? If not, what schedule can I set to get the behaviour I desire?

Upvotes: 2

Views: 4178

Answers (1)

Elad Kalif
Elad Kalif

Reputation: 15931

Airflow is not a cron job scheduler. Airflow calculates start_date + schedule_interval and execute the job at the end of the interval. The reason behind this is explained in this answer.

In your case: start_date=datetime(2021,06,25) with schedule_interval = "30 5,7,9 * * *" gives:

1st tasks with execution_date 2021-06-25 5:30 will start running on 2021-06-25 7:30

2nd task with execution_date 2021-06-25 7:30 will start running on 2021-06-25 9:30

3rd task with execution_date 2021-06-25 9:30 will start running on 2021-06-26 5:30

4th task with execution_date 2021-06-26 5:30 will start running on 2021-06-26 7:30

5th task with execution_date 2021-06-26 7:30 will start running on 2021-06-26 9:30

6th task with execution_date 2021-06-26 9:30 will start running on 2021-06-27 5:30

7th task with execution_date 2021-06-27 5:30 will start running on 2021-06-27 7:30

8th task with execution_date 2021-06-27 7:30 will start running on 2021-06-27 9:30

9th task with execution_date 2021-06-27 9:30 will start running on 2021-06-28 5:30

and so on...

Note that you still get 3 runs per day (except the first date) as you expect it just a matter of understanding how scheduling works. If you want to get 3 runs on the first date as well then change your start_date to datetime(2021,06,24,9,30). The execution_date is a logical date. If needed you can reference relevant dates within your DAG code usings macros - for example:

I mentioned that the 6th run execution_date is 2021-06-26 9:30 using macros with that runs can give you:

prev_execution_date is 2021-06-26 7:30

next_execution_date is 2021-06-27 5:30

Note: your code has catchup=False so the exact dates I wrote here won't be the same but that effects only on the first run. The following runs will follow the same logic.

Upvotes: 1

Related Questions