Reputation: 945
Unfortunately even after reading the many questions here and the FAQ page of the airflow website, I still don't understand how airflow schedules tasks. I have a very simple example task here:
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
default_args = {
"depends_on_past": False,
"start_date": datetime(2020, 5, 29),
"email_on_failure": False,
"email_on_retry": False,
"retries": 1,
"retry_delay": timedelta(minutes=5),
}
dag = DAG(
"example_dag_one",
schedule_interval="30 8 * * *",
catchup=False,
default_args=default_args,
)
with dag:
t1 = BashOperator(task_id="print_hello", bash_command="echo hello", dag=dag)
t1
My naiv view would be that this task would be run on May 29th 08:30. But as the time passes, airflow has not scheduled that task. If I change the cron expression to something like: '* 8 * * *' It will schedule a task every minute.
When I however use the same DAG with a start date of yesterday (so May 28th in that case) the task will be scheduled at 08:30, yet it's execution date is the 28th (even though it ran on May 29th) and the start date in the web ui is May 29th. This is VERY confusing.
What I want from airflow in the end is simple: "Here is python code, run it on this time day". So how could I achieve that. Again let's say I want to schedule a task on 08:30 every day starting tomorrow.
Upvotes: 1
Views: 6725
Reputation: 393
Actually Airflow will wait for the entire scheduling interval (1 day) to be completed, then the execution would start !
So if you want your task to be executed today 2020/ 5/ 29
, you should set the start time in a way that the schedule interval finishes. So set the start time to : datetime(2020, 5, 28)
If the schedule interval is 1 week, so the task would be launched 1 week later of the start time and so on ...
Upvotes: 0
Reputation: 2342
The answer can be found in Airflow official documentation:
Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended.
Let’s Repeat That The scheduler runs your job one schedule_interval AFTER the start date, at the END of the period.
So applying to your case, if you put start date 29th of May, with the original cron, it will run every day at 08:30 starting from tomorrow 30th of May.
Anyway, if you don't need a dag specifically at some point in the day, you can just set schedule interval to '@daily', and it will be triggered at the beginning (00:00) of each day. If there are a lot of dags with @daily, don't worry, the scheduler and the workers will know how to handle it to execute all of them. If you have dags that depend on other dags, there are mechanisms to concatenate them so that you still don't have to worry about specifying hours.
Upvotes: 3