dmjani
dmjani

Reputation: 31

Airflow DAG is triggered twice by scheduler. Weekly DAG

Airflow DAG is triggered twice on Monday for below configurations.

When I use 30 11 * * 1 cron expression, DAG doesn't trigger at all. So figured out I have to add one more * to the expression. 30 11 * * 1 * - It works.

default_args:
'start_date': airflow.utils.dates.days_ago(1)

DAG : schedule_interval=30 11 * * 1 *, ## This is weekly run on Monday at 11:30.

However, DAG is getting triggered 2 times every Monday. 1 min apart:

What could be the possible reason?

Upvotes: 1

Views: 3743

Answers (3)

dmjani
dmjani

Reputation: 31

So finally, I figured out the issue.

Yes it is correct, 5 digit cron expression is correct. I am using schedule_interval = 30 11 * * 1 #(Every Monday 11:30 UTC)

It wasn't working because I had my start_time :

'start_date': airflow.utils.dates.days_ago(1)

I found this blog on Airflow — Trick to find the exact [start_date] via CRON expression here!

If it's a weekly job, your start_date should be a week ago. So I changed it to 'start_date': airflow.utils.dates.days_ago(7)

Now it is working fine.

Thank you!!!

Upvotes: 2

brki
brki

Reputation: 2780

The cron parser that airflow is using interprets the 6th place as seconds (as you can see here: https://github.com/kiorky/croniter/blob/master/src/croniter/tests/test_croniter.py#L14 ).

I'm assuming that your DAG finishes in under a minute. The next scheduler loop, it sees that the cron schedule still matches (on the 58th second), so it starts the DAG again.

I was having the same issue, because the Airflow documentation linked to a wikipedia entry about cron that showed 6 entries. 6 entries is non standard, and there is more than one implementation. Anyway, for Airflow, the 6th entry is interpreted as seconds.

Your 5 place cron expression should work. Maybe try again? However, change the dag id, or you may run into weird behaviour: From https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls : Changing schedule interval always requires changing the dag_id, because previously run TaskInstances will not align with the new schedule interval

Upvotes: 0

SMDC
SMDC

Reputation: 717

The 6 digits cron expression is incorrect, the first one you input is correct. How many times did you run the DAG? I suggest you try to run schedule_interval=@weekly first and see what happens ?

Upvotes: 0

Related Questions