Reputation: 111
I'm new to Airflow.
My goal is to run a dag, on a daily basis, starting 1 hour from now.
I'm truly misunderstanding the airflow schedule "end-of-interval invoke" rules.
From the docs [(Airflow Docs)][1]
Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended.
I set schedule_interval as followed:
schedule_interval="00 15 * * *"
and start_date as followed:
start_date=datetime(year=2019, month=8, day=7)
My assumption was, that if now it's 14:00:00 PM (UTC time) and the date today is 07-08-2019, then my dag will be executed exactly in one hour. However, my dag is not starting at all.
Upvotes: 1
Views: 1147
Reputation: 62
schedule_interval="00 15 * * *" start_date=07-08-2019
1st run will be on 08-08-2019 at 3:00 if you created this dag before 3:00 on 7-8-2019
Upvotes: 2
Reputation: 2591
So there is a whole page talking about airflow job not been scheduled. https://airflow.apache.org/faq.html
The key thing to notice here is:
The Airflow scheduler triggers the task soon after the start_date + scheduler_interval is passed.
To my understanding, you want to trigger a task start_date=datetime(year=2019, month=8, day=7) at 15:00 UTC daily
. schedule_interval="00 15 * * *"
means you would run the task every day at 15:00 UTC. According to the docs, The scheduler triggers your task after start_date + scheduler_interval, so airflow won't trigger it until the next day which is August 8th 2019 15:00:00 UTC
. Or you can change the day to 6th. It might be easier to understand this from ETL way: you can only process the data for a given period after it has passed. So August 7th 2019 15:00:00 UTC
is your start point, you need to wait until August 8th 2019 15:00:00 UTC
to run the task within that given period.
Also, note airflow has execution_data and start_date, you can find more here
Upvotes: 3