J.Fratzke
J.Fratzke

Reputation: 1465

Airflow Scheduler is slow by 1 interval

I have a couple of schedules that are slow by one interval. My configuration looks like

args = {
    'owner' : 'test',
    'start_date' : datetime.now(),
    'email' : ['[email protected]'],
    'email_on_failure': True,
    'email_on_retry' : True,
    'retries' : 3,
    'retry_delay' : timedelta(seconds=30)

}
dag = DAG(
    dag_id='feed_response', default_args=args,
    concurrency=4,
    schedule_interval='0 2 * * 6',
    dagrun_timeout=timedelta(minutes=20)
)

This schedule should have run an instance for last Saturday. It ran for the previous Saturday. I've noticed this behavior in a couple of our jobs. Is there a reason why the scheduler seems to lag by one interval behind?

Upvotes: 0

Views: 1112

Answers (2)

liferacer
liferacer

Reputation: 2513

Airflow documentation doesn't recommend using dynamic values for start_date specially datetime.now()
https://airflow.incubator.apache.org/faq.html#what-s-the-deal-with-start-date

Upvotes: 0

Christian Trebing
Christian Trebing

Reputation: 398

This behavior is described on the airflow wiki in the "Common Pitfalls" section (https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls):

Understanding the execution date: Airflow was developed as a solution for ETL needs. In the ETL world, you typically summarize data. So, if I want to summarize data for 2016-02-19, I would do it at 2016-02-20 midnight GMT, which would be right after all data for 2016-02-19 becomes available.

Upvotes: 2

Related Questions