data_addict
data_addict

Reputation: 894

Airflow schedule not updating

I created a DAG that will run on a weekly basis. Below is what I tried and it's working as expected.

from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash_operator import BashOperator

SCHEDULE_INTERVAL = timedelta(weeks=1, seconds=00, minutes=00, hours=00)
default_args = {
    'depends_on_past': False,
    'retries': 0,
    'retry_delay': timedelta(minutes=2),
    'wait_for_downstream': True,
    'provide_context': True,
    'start_date': datetime(2020, 12, 20, hour=00, minute=00, second=00)
}

with DAG("DAG", default_args=default_args, schedule_interval=SCHEDULE_INTERVAL, catchup=True) as dag:
    t1 = BashOperator(
        task_id='dag_schedule',
        bash_command='echo DAG',
        dag=dag)

As per the schedule, it ran on the 27(i.e. 20 in the script). As there is a change in requirement, Now I updated the start date to 30th(i.e 23 in the script) instead of 27(My idea is to start the schedule from 30 and from there onwards every week). When I change the schedule of the DAG i.e. start date from 27 to 30th. DAG is not picking as per the latest start date, not sure why? When I deleted the DAG(as it is test DAG I deleted it, in prod I can't delete it) and created the new DAG with the same name with the latest start date i.e. 30th, it's running as per the schedule.

Upvotes: 3

Views: 3637

Answers (2)

de-learner
de-learner

Reputation: 256

As per the Airflow DOC's

When needing to change your start_date and schedule interval, change the name of the dag (a.k.a. dag_id) - I follow the convention : my_dag_v1, my_dag_v2, my_dag_v3, my_dag_v4, etc...

  • Changing schedule interval always requires changing the dag_id, because previously run TaskInstances will not align with the new schedule interval
  • Changing start_date without changing schedule_interval is safe, but changing to an earlier start_date will not create any new DagRuns for the time between the new start_date and the old one, so tasks will not automatically backfill to the new dates. If you manually create DagRuns, tasks will be scheduled, as long as the DagRun date is after both the task start_date and the dag start_date.

So if we change start date we need to change the DAG name or delete the existing DAG so that it will be recreated with the same name again(metadata related to previous DAG will be deleted from metadata)

Source

Upvotes: 8

Elad Kalif
Elad Kalif

Reputation: 15911

Your DAG as you defined it will be triggered on 6-Jan-2021

Airflow schedule tasks at the END of the interval (See doc reference)

So per your settings:

SCHEDULE_INTERVAL = timedelta(weeks=1, seconds=00, minutes=00, hours=00)

and

'start_date': datetime(2020, 12 , 30, hour=00, minute=00, second=00)

This means the first run will be on 6-Jan-2021 because 30-Dec-2020 + 1 week = 6-Jan-2021 Note that the execution_date of this run will be 2020-12-30

Upvotes: 1

Related Questions