Infinite
Infinite

Reputation: 764

Airflow Scheduling first of month

I want to schedule a DAG at first of month at 5 AM UTC time. so lets say that I want to start running my DAG from 01/01/2021 5 AM . what should be my start date and schedule interval. I want the DAG run on 01/01/2021 to have the same execution date that is 01/01/2021. Any leads on how this could be achieved.

Thanks

Upvotes: 2

Views: 2919

Answers (1)

NicoE
NicoE

Reputation: 4873

The FAQs about execution_date may help you understand what's happening, (see also DAG Runs):

Airflow was developed as a solution for ETL needs. In the ETL world, you typically summarize data. So, if you want to summarize data for 2016-02-19, You would do it at 2016-02-20 midnight UTC, which would be right after all data for 2016-02-19 becomes available.

Basically, the DAG with execution_date = 2021-01-01T05:00:00+00:00 will actually be executed one schedule_interval later (2021-02-01T05:00:00+00:00). The actual date the execution occurred, is represented in the start_date attribute of the "dag_run" object (you can access it through the execution context parameters). It is the same date that you can find in the Explore UI >> Dag Runs >> Start Date column.

Try creating a dummy DAG like the following:

from datetime import datetime

from airflow import DAG
from airflow.operators.dummy import DummyOperator

args = {
    "owner": "airflow",
}

with DAG(
    dag_id="dummy_dag",
    start_date=datetime(2021, 1, 1, 5),
    schedule_interval="0 5 1 * *",
) as dag:

    t1 = DummyOperator(task_id="task_1")

After the first exeuction, you could play around with the CLI to calculate future execution dates:

~/airflow$ airflow dags next-execution dummy_dag -n 10 -h
usage: airflow dags next-execution [-h] [-n NUM_EXECUTIONS] [-S SUBDIR] dag_id

Get the next execution datetimes of a DAG.
It returns one execution unless the num-executions option is given

Let me know if that worked for you!

Upvotes: 1

Related Questions