Reputation: 612
I’m new to Airflow and I’m trying to understand how to use the scheduler correctly. Basically I want to schedule tasks the same way as I use cron. There’s a task that needs to be run every 5 minutes and I want it to start at the dag run next even 5 min slot after I add the DAG file to dags directory or after I have made some changes to the dag file.
I know that the DAG is run at the end of the schedule_interval. If I add a new DAG and use start_date=days_ago(0) then I will get the unnecessary runs starting from the beginning of the day. It also feels stupid to hardcode some specific start date on the dag file i.e. start_date=datetime(2019, 9, 4, 10, 1, 0, 818988). Is my approach wrong or is there some specific reason why the start_date needs to be set?
Upvotes: 0
Views: 6824
Reputation: 612
I think I found an answer to my own question from the official documentation: https://airflow.apache.org/scheduler.html#backfill-and-catchup
By turning off the catchup, DAG run is created only for the most recent interval. So then I can set the start_date to anything in the past and define the dag like this:
dag = DAG('good-dag', catchup=False, default_args=default_args, schedule_interval='*/5 * * * *')
Upvotes: 5