Reputation: 41
I am trying to run Airflow on the 2nd of every month at 11.00am, but I am failing to do so. My settings are:
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': today_date,
'email': ['mymail'],
'email_on_failure': True,
'email_on_retry': True,
'retries': 1,
'retry_delay': timedelta(minutes=7),
}
dag = DAG('my_dag', default_args=default_args, schedule_interval='00 11 02 * *')
Airflow works flawlessly when I run a DAG on a daily basis:
schedule_interval='00 11 * * *'
but I don't seem to be able to make it work for a monthly basis :(
thanks!
Upvotes: 4
Views: 12680
Reputation: 86
I'm having similar issue and discovered that my understanding of the airflow's schedule_interval
as an equivalent of cronjob
was wrong! It has same format, yes, but airflow triggers tasks differently than cronjob:
Airflow treats every run as an data interval
, which starts at specified schedule_interval
and ends before next run. But actual execution of the DAG run is performed on the data interval
's end! and for schedule_interval = '@monthly'
.
For Example: your DAG, scheduled on 11:00 2.MM, e.g. having execution_date=dt.datetime(YYYY,MM,2,11,0,0)
will actually be scheduled to run on 2.MM+1.YYYY 10:59:59
, e.g. at the end of your month.
(just wait patiently and see your code is working in a month .
update: as mentioned above - since your today_date
is dynamic, on every load of your DAG script, the execution timetable will be shifted one month ahead and it never actually runs! you need to use constants.)
"All dates in Airflow are tied to the data interval concept in some way. The “logical date” (also called execution_date in Airflow versions prior to 2.2) of a DAG run, for example, denotes the start of the data interval, not when the DAG is actually executed.
Similarly, since the start_date argument for the DAG and its tasks points to the same logical date, it marks the start of the DAG’s first data interval, not when tasks in the DAG will start running. In other words, a DAG run will only be scheduled one interval after start_date."
See also: Airflow Start_Date And Execution_Date Explained
So, if you want to run you DAG "today", you need to specify 'start_date': month_ago_date,
and if you use execution_date
parameter, keep in mind it is equal to data_interval_start
, not data_interval_end
when the task is actually runs...
Upvotes: 1
Reputation: 51
If you want to run your dag on 2nd of every month at 11.00am. you can use this code.
schedule_interval = '0 11 2 * *'
dag_name = DAG(
'DAG_ID',
default_args=default_args,
schedule_interval=schedule_interval,
)
in the schedule interval 0 refers minute, 11 refers hour, 2 refers day of month, * refers any month, and next * refers any day of week.
for more scheduler Information check this website. https://crontab.guru/#0_11_2__
Upvotes: 3
Reputation: 2498
In the comment you mention, that you use datetime.today()
for start_date
and that is exactly what's causing the problem. The job instance is started once the period it covers has ended, but in your case that will never happen. Try to adjust start_date
to something like:
from datetime import date
from dateutil.relativedelta import relativedelta
start_date = date.today() + relativedelta(months=-1)
I suggest to re-read the Scheduling & Triggers section in the documentation. I took me also a couple of time to get how to correctly schedule DAGs.
Upvotes: 1