jabberwocky
jabberwocky

Reputation: 55

Airflow DAG does not run at specified time with catchup=False

I have an airflow DAG, for which it doesn't really make sense to backfill. I figured out that with airflow 1.8, you can give the DAG the parameter catchup=False, so it will only start the most recent job. That said, I want to have the DAG start at midnight and run daily. But. And that's the thing now: The DAG starts immediately and not at midnight. Also when I clear all DAG runs, it will start immediately again. The DAG will then run daily, but will be scheduled at the wrong time it started + 1 day.

How can I have a DAG which only starts running the most recent job, and starts at a specific time (midnight)?

Here is the code I use:

from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator

default_args = {'depends_on_past': False,
                'start_date': datetime(2013, 1, 1)}

with DAG('test_dag',
         default_args=default_args,
         schedule_interval=timedelta(days=1),
         catchup=False
         ) as dag:
    test = DummyOperator(task_id='test')

Upvotes: 2

Views: 3592

Answers (1)

Chengzhi
Chengzhi

Reputation: 2591

You can put crontab in schedule_interval, more detail can be found here: https://airflow.apache.org/scheduler.html#dag-runs, for example schedule_interval="0 0 * * *"

Also, Airflow is running under UTC, please adjust your "midnight" to the correct timezone.

Upvotes: 2

Related Questions