Reputation: 61
I have an Airflow DAG which I need to backfill, when I change the start_date
and run the dag, it doesn't understand the start_date
and just starts from the current date.
I copied my code to a new python file, for example from 'dag_xx.py' to 'dag_xx_backfill.py', and changed the name of dag itself and all the tasks it has. Also, I used the Delete
button in the UI to clear the whole state of the dag and start it all over again. But yet, it doesn't start from my desired start_date
There are some configs in the dag's default_args
, like:
default_args = {
"owner": "airflow",
"depends_on_past": False,
"retries": 1,
"retry_delay": timedelta(minutes=1),
"catchup": True
}
test_dag_backfill = DAG(
dag_id="test_dag_backfill",
description="backfill the data",
default_args=default_args,
start_date=datetime(2020, 11, 1, 3, 0, tzinfo=local_tz),
schedule_interval="0 * * * *", # or @hourly
max_active_runs=1,
)
As you can see, the start_date
is November 1st, but it starts from the current date (December 2nd).
Do you have any idea what I am missing here?
Upvotes: 2
Views: 1022
Reputation: 61
Well, I found the reason. If you use catchup
in default_args
, it doesn't work, because it's a dag property but in default_args
you can just define default operator properties. What I did was to include catchup
in DAG properties directly and it worked.
Thanks to: https://stackoverflow.com/a/54692189/10874265
Upvotes: 2