sjishan
sjishan

Reputation: 3672

Airflow scheduling old tasks after the DAG is unpaused

I paused a job DAG for one month. This job runs every 10 minutes. Now that I turned it on I can see airflow is trying to run everything since the day I paused the job.

Each time I am clearing the tasks list more tasks are getting scheduled. It will likely have 2000+ tasks.

I want the DAG to start the current tasks and discard all the tasks from the past.

Upvotes: 0

Views: 1654

Answers (1)

Philipp Johannis
Philipp Johannis

Reputation: 2956

I guess catchup should solve your challenge, it is an argument of the DAG:

An Airflow DAG with a start_date, possibly an end_date, and a schedule_interval defines a series of intervals which the scheduler turns into individual DAG Runs and executes. The scheduler, by default, will kick off a DAG Run for any interval that has not been run since the last execution date (or has been cleared). This concept is called Catchup.
If your DAG is written to handle its catchup (i.e., not limited to the interval, but instead to Now for instance.), then you will want to turn catchup off. This can be done by setting catchup = False in DAG

dag = DAG(
    'tutorial',
    default_args=default_args,
    start_date=datetime(2015, 12, 1),
    description='A simple tutorial DAG',
    schedule_interval='@daily',
    catchup=False)

Upvotes: 2

Related Questions