Jie Liu
Jie Liu

Reputation: 107

Airflow: Why my DAG does not run on expected day

I created a task that I wish to run at 17:00 from Monday to Friday.

dag = DAG(
    dag_id='dummy',
    schedule_interval="00 17 * * 1-5",
    start_date=days_ago(2),
    default_args=args,
    tags=['test']
)

But it doesn't run as expected. I did a lot of research online and I noticed that "airflow schedule tasks at the END of scheduling period". Unfortunately I still couldn't understand how airflow schedule the tasks. I my case, does that mean airflow scheduler will start at 17:00 on Monday but run the job on Tuesday? If that is true, will it work if I changed the schedule_interval to "00 17 * * 0-4"

Upvotes: 1

Views: 1570

Answers (2)

Dawid Laszuk
Dawid Laszuk

Reputation: 1977

I had the same issue. Reading 7 common errors to check when debugging Airflow DAGs the first mention is exactly this, i.e.

an Airflow DAG will execute at the completion of its schedule_interval, which means one schedule_interval AFTER the start date. An hourly DAG, for example, will execute its 2:00 PM run when the clock strikes 3:00 PM. This happens because Airflow can’t ensure that all of the data from 2:00 PM - 3:00 PM is present until the end of that hourly interval.

This suggests such odd behaviour is by design. As mentioned elsewhere, Airflow upgrade and using Timetables is the correct way to deal with this.

In case you can't update, I think your solution should be 0 17 * * 0-5 and expect that the first one will be skipped.

Upvotes: 0

Collin McNulty
Collin McNulty

Reputation: 484

Airflow 2.2 has introduced Timetables to help you get the effect you're looking for. 1-5 should get you a run at the times you expect except the execution_date will be for the previous day (a run that occurs on Friday will have Thursday's date and a run that occurs on Monday will have Friday's date).

Upvotes: 2

Related Questions