Reputation: 877
I have the DAG:
dag = DAG(
dag_id='example_bash_operator',
default_args=args,
schedule_interval='0 0 * * *',
start_date=days_ago(2),
dagrun_timeout=timedelta(minutes=60),
tags=['example']
)
What is the significance of dag.cli()? What role does cli() play?
if __name__ == "__main__":
dag.cli()
Today is 14th oct. When i add catchup false, it executes for 13 oct. Should not it just execute only for 14th. Without it executes for 12 and 13 which makes sense as it would backfill. But with catchup false why does it execute for 13th oct?
dag = DAG(
dag_id='example_bash_operator',
default_args=args,
schedule_interval='0 0 * * *',
start_date=days_ago(2),
catchup=False,
dagrun_timeout=timedelta(minutes=60),
tags=['example']
)
Upvotes: 2
Views: 1653
Reputation: 2936
You should avoid setting the start_date
to a relative value - this can lead to unexpected behaviour as this value is newly interpreted everytime the DAG file is parsed.
There is a long description within the Airflow FAQ:
We recommend against using dynamic values as start_date, especially datetime.now() as it can be quite confusing. The task is triggered once the period closes, and in theory an @hourly DAG would never get to an hour after now as now() moves along.
Regarding dag.cli()
, I would remove this whole part - it's definitely not required by DAG to be executed by airflow scheduler, see this question.
Regarding catchup=False
and why it executes for the 13th of October - Have a look on scheduler documentation
The scheduler won’t trigger your tasks until the period it covers has ended e.g., A job with schedule_interval set as @daily runs after the day has ended. This technique makes sure that whatever data is required for that period is fully available before the dag is executed. In the UI, it appears as if Airflow is running your tasks a day late
Note If you run a DAG on a schedule_interval of one day, the run with execution_date 2019-11-21 triggers soon after 2019-11-21T23:59. Let’s Repeat That, the scheduler runs your job one schedule_interval AFTER the start date, at the END of the period.
Also the article Scheduling Tasks in Airflow might be worth a read.
Upvotes: 3