Mahshad Shirzaei
Mahshad Shirzaei

Reputation: 61

Airflow DAG doesn't start based on `start_date`, it starts from now

I have an Airflow DAG which I need to backfill, when I change the start_date and run the dag, it doesn't understand the start_date and just starts from the current date. I copied my code to a new python file, for example from 'dag_xx.py' to 'dag_xx_backfill.py', and changed the name of dag itself and all the tasks it has. Also, I used the Delete button in the UI to clear the whole state of the dag and start it all over again. But yet, it doesn't start from my desired start_date There are some configs in the dag's default_args, like:

default_args = {
    "owner": "airflow",
    "depends_on_past": False,
    "retries": 1,
    "retry_delay": timedelta(minutes=1),
    "catchup": True
}

test_dag_backfill = DAG(
    dag_id="test_dag_backfill",
    description="backfill the data",
    default_args=default_args,
    start_date=datetime(2020, 11, 1, 3, 0, tzinfo=local_tz),
    schedule_interval="0 * * * *", # or @hourly
    max_active_runs=1,
)

As you can see, the start_date is November 1st, but it starts from the current date (December 2nd). Do you have any idea what I am missing here?

Upvotes: 2

Views: 1022

Answers (1)

Mahshad Shirzaei
Mahshad Shirzaei

Reputation: 61

Well, I found the reason. If you use catchup in default_args, it doesn't work, because it's a dag property but in default_args you can just define default operator properties. What I did was to include catchup in DAG properties directly and it worked. Thanks to: https://stackoverflow.com/a/54692189/10874265

Upvotes: 2

Related Questions