Reputation: 55
I have an airflow DAG, for which it doesn't really make sense to backfill. I figured out that with airflow 1.8, you can give the DAG the parameter catchup=False
, so it will only start the most recent job.
That said, I want to have the DAG start at midnight and run daily.
But. And that's the thing now: The DAG starts immediately and not at midnight. Also when I clear all DAG runs, it will start immediately again. The DAG will then run daily, but will be scheduled at the wrong time it started + 1 day.
How can I have a DAG which only starts running the most recent job, and starts at a specific time (midnight)?
Here is the code I use:
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
default_args = {'depends_on_past': False,
'start_date': datetime(2013, 1, 1)}
with DAG('test_dag',
default_args=default_args,
schedule_interval=timedelta(days=1),
catchup=False
) as dag:
test = DummyOperator(task_id='test')
Upvotes: 2
Views: 3592
Reputation: 2591
You can put crontab in schedule_interval, more detail can be found here: https://airflow.apache.org/scheduler.html#dag-runs, for example schedule_interval="0 0 * * *"
Also, Airflow is running under UTC, please adjust your "midnight" to the correct timezone.
Upvotes: 2