Reputation: 1465
I have a couple of schedules that are slow by one interval. My configuration looks like
args = {
'owner' : 'test',
'start_date' : datetime.now(),
'email' : ['[email protected]'],
'email_on_failure': True,
'email_on_retry' : True,
'retries' : 3,
'retry_delay' : timedelta(seconds=30)
}
dag = DAG(
dag_id='feed_response', default_args=args,
concurrency=4,
schedule_interval='0 2 * * 6',
dagrun_timeout=timedelta(minutes=20)
)
This schedule should have run an instance for last Saturday. It ran for the previous Saturday. I've noticed this behavior in a couple of our jobs. Is there a reason why the scheduler seems to lag by one interval behind?
Upvotes: 0
Views: 1112
Reputation: 2513
Airflow documentation doesn't recommend using dynamic values for start_date
specially datetime.now()
https://airflow.incubator.apache.org/faq.html#what-s-the-deal-with-start-date
Upvotes: 0
Reputation: 398
This behavior is described on the airflow wiki in the "Common Pitfalls" section (https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls):
Understanding the execution date: Airflow was developed as a solution for ETL needs. In the ETL world, you typically summarize data. So, if I want to summarize data for 2016-02-19, I would do it at 2016-02-20 midnight GMT, which would be right after all data for 2016-02-19 becomes available.
Upvotes: 2