rajat_th
rajat_th

Reputation: 33

Airflow scheduler doesn't work for monthly jobs schedule

I am trying to schedule a monthly airflow job. I kept start date as

'start_date':datetime(2020,9,23),

which is the date for previous month(today's date); because of the 'start_date+schedule_interval' rule. I kept my schedule interval as :

 schedule_interval="20 9 23 * *"

By this logic job should run on 2020/23/10 9:23 UTC . But I don't know why it's not running or even creating an instance. I did everything right, kept start date to one month before and even tried with catchup= True. But it doesn't help.

Job is running if I try keeping the schedule as daily; ex:

start_date':airflow.utils.dates.days_ago(1)

and schedule interval as:

schedule_interval="20 9 * * *"

and it works file. Ran a job today at 9.20 UTC.

Note: I have ran the job before manually so it has last execution date as something else. Can that be the problem . If so, how can I resolve it or will I have to create a new job.

Upvotes: 0

Views: 2296

Answers (1)

Philipp Johannis
Philipp Johannis

Reputation: 2946

Changing the schedul_interval can cause problems and it's recommended to create a new DAG, see Common Pitfalls on Apache Airflow Confluence:

When needing to change your start_date and schedule interval, change the name of the dag (a.k.a. dag_id) - I follow the convention : my_dag_v1, my_dag_v2, my_dag_v3, my_dag_v4, etc...

  • Changing schedule interval always requires changing the dag_id, because previously run TaskInstances will not align with the new schedule interval
  • Changing start_date without changing schedule_interval is safe, but changing to an earlier start_date will not create any new DagRuns for the time between the new start_date and the old one, so tasks will not automatically backfill to the new dates. If you manually create DagRuns, tasks will be scheduled, as long as the DagRun date is after both the task start_date and the dag start_date.

Upvotes: 3

Related Questions