Reputation: 21
I am trying to see how Airflow sets execution_date for any DAG. I have made the property catchup=false in the DAG. Here is my
dag = DAG(
'child',
max_active_runs=1,
description='A sample pipeline run',
start_date=days_ago(0),
catchup=False,
schedule_interval=timedelta(minutes=5)
)
Now, since Catchup=false, it should skip the runs prior to current_time. It does the same, however a strange thing is it is not setting the execution_date right.
Here, the runs execution time:
We can see the runs are scheduled at freq of 5 min. But, why does it append seconds and milliseconds to time? This is impacting my sensors later. Note that the behaviour runs fine when catchup=True.
Upvotes: 0
Views: 1207
Reputation: 21
I did some permutations. Seems that the execution_time is correctly coming when I specify cron, instead of timedelta function. So, my DAG now is
dag = DAG(
'child',
max_active_runs=1,
description='A sample pipeline run',
start_date=days_ago(0),
catchup=False,
schedule_interval='*/5 * * * *'
)
Hope it will help someone. I have also raised a bug for this: Can be tracked at : https://github.com/apache/airflow/issues/11758
Upvotes: 1
Reputation: 2956
Regarding execution_date
you should have a look on scheduler documentation. It is the begin of the period, but get's triggered at the end of the period (start_date).
The scheduler won’t trigger your tasks until the period it covers has ended e.g., A job with schedule_interval set as @daily runs after the day has ended. This technique makes sure that whatever data is required for that period is fully available before the dag is executed. In the UI, it appears as if Airflow is running your tasks a day late
Note If you run a DAG on a schedule_interval of one day, the run with execution_date 2019-11-21 triggers soon after 2019-11-21T23:59. Let’s Repeat That, the scheduler runs your job one schedule_interval AFTER the start date, at the END of the period.
Also the article Scheduling Tasks in Airflow might be worth a read.
You also should avoid setting the start_date
to a relative value - this can lead to unexpected behaviour as this value is newly interpreted everytime the DAG file is parsed.
There is a long description within the Airflow FAQ:
We recommend against using dynamic values as start_date, especially datetime.now() as it can be quite confusing. The task is triggered once the period closes, and in theory an @hourly DAG would never get to an hour after now as now() moves along.
Upvotes: 0