drum
drum

Reputation: 5661

airflow: How to modify DAG to backfill?

I have a DAG has been running for a while. Now I more old data available and want to backfill.

I change my parameters:

default_args = {
    'owner': 'drum',
    'depends_on_past': False,
    'start_date': datetime(2019, 7, 1),
    'retries': 2,
    'retry_delay': timedelta(minutes=5)
}

dag = DAG(
    dag_id='dag_one',
    catchup=False,
    default_args=default_args,
    schedule_interval='@weekly',
    max_active_runs=1
)

To:

default_args = {
    'owner': 'drum',
    'depends_on_past': False,
    'start_date': datetime(2018, 1, 1), ### Update
    'retries': 2,
    'retry_delay': timedelta(minutes=5)
}

dag = DAG(
    dag_id='dag_one',
    catchup=True,  ### Update
    default_args=default_args,
    schedule_interval='@weekly',
    max_active_runs=1
)

However this does not trigger the backfill. I am using the GUI explicitly as I do not have access to the terminal.

Upvotes: 0

Views: 337

Answers (1)

amoskaliov
amoskaliov

Reputation: 799

As I remember, you also need to update your dag_id (e. g. to dag_one_v2) when changing start_date. But be careful as updating the dag_id will lead to losing all dag's metadata. So Airflow will re-execute all dags since 2019-07-01. You may also need to add some kind of check, whether your data have been already processed or not.

Upvotes: 2

Related Questions