Reputation: 764
I want to schedule a DAG at first of month at 5 AM UTC time. so lets say that I want to start running my DAG from 01/01/2021 5 AM . what should be my start date and schedule interval. I want the DAG run on 01/01/2021 to have the same execution date that is 01/01/2021. Any leads on how this could be achieved.
Thanks
Upvotes: 2
Views: 2919
Reputation: 4873
The FAQs about execution_date
may help you understand what's happening, (see also DAG Runs):
Airflow was developed as a solution for ETL needs. In the ETL world, you typically summarize data. So, if you want to summarize data for 2016-02-19, You would do it at 2016-02-20 midnight UTC, which would be right after all data for 2016-02-19 becomes available.
Basically, the DAG with execution_date = 2021-01-01T05:00:00+00:00
will actually be executed one schedule_interval
later (2021-02-01T05:00:00+00:00). The actual date the execution occurred, is represented in the start_date
attribute of the "dag_run" object (you can access it through the execution context parameters). It is the same date that you can find in the Explore UI >> Dag Runs >> Start Date column.
Try creating a dummy DAG like the following:
from datetime import datetime
from airflow import DAG
from airflow.operators.dummy import DummyOperator
args = {
"owner": "airflow",
}
with DAG(
dag_id="dummy_dag",
start_date=datetime(2021, 1, 1, 5),
schedule_interval="0 5 1 * *",
) as dag:
t1 = DummyOperator(task_id="task_1")
After the first exeuction, you could play around with the CLI to calculate future execution dates:
~/airflow$ airflow dags next-execution dummy_dag -n 10 -h
usage: airflow dags next-execution [-h] [-n NUM_EXECUTIONS] [-S SUBDIR] dag_id
Get the next execution datetimes of a DAG.
It returns one execution unless the num-executions option is given
Let me know if that worked for you!
Upvotes: 1