Reputation: 97
Note: The crontab.guru links were breaking so I wrapped them in code blocks.
I have a DAG that is to be executed on Mondays at midnight pacific time, 8am UTC, bumped by 1 minute to avoid any overlap issues.
Originally the schedule interval was set as 1 8 */1 * 1
which according to https://crontab.guru/#1_8_*/1_*_1
is "At 08:01 UTC (03:01 EST, 00:01 PST) on every day-of-month if it's on Monday".
However, this caused the DAG to trigger every day at 08:01 UTC; the Monday condition seemed to be ignored.
The schedule interval was updated to the simpler 1 8 * * 1
, which according to https://crontab.guru/#1_8_*_*_1
is "At 08:01 UTC (03:01 EST, 00:01 PST) on Monday".
This stopped the DAG from executing every day, but it did not trigger on 2019-02-18, the first Monday following the update. I've read some other posts that indicate that the start date might cause this issue, but this task's start date is datetime(2019, 2, 11, 0, 0, 0, 0, pytz.UTC)
, which is two intervals before the 2019-02-18 run date.
Here is the complete DAG/task definition (without imports or specific names):
dag = DAG(
dag_id="dag",
description="dag",
# At 08:01 UTC (03:01 EST, 00:01 PST) on Monday
# (https://crontab.guru/#1_8_*_*_1)
schedule_interval="1 8 * * 1",
catchup=False,
)
task = PythonOperator(
task_id="handle",
provide_context=True,
python_callable=handle,
dag=dag,
retries=2,
retry_delay=timedelta(minutes=15),
start_date=datetime(2019, 2, 11, 0, 0, 0, 0, pytz.UTC),
)
Any idea why this wouldn't have executed after the 2019-02-18 00:01 UTC interval?
Upvotes: 4
Views: 3852
Reputation: 1031
EDIT:
The reason you do not see the run execute on the 18th is you have catchup=False
This will cause the DAG to skip backfill days if they have already passed. If you want to see the DAG fill in the 17th and the 24th you would need to set catchup=True
Airflow DAGs execute at the END of the Schedule Interval, so if your start date is the current Monday and your interval is every Monday, the DAG will not execute for this Monday’s run until the following Monday.
The main idea here is the data for the current Monday run is now available until the end of that interval period. This makes more sense of you think about it in terms of daily jobs. If you are running a job that is looking for today’s data, that data set will not be complete until the end of today. So if you want to run the data for today, you need to execute your job tomorrow. This is just a convention that Airflow has adopted, like it or not.
If you would like to adjust the dates, you can use {{ macros.ds_add( ds, 7) }}
to shift the execution date by 7 days.
Let me know if this answer makes sense. If not I will expand on it. This convention has been the most nagging detail we have had to deal with while developing for Airflow jobs.
Upvotes: 4