Reputation: 67
edit: I figured out my problem. I didn't understand the different between triggering a run and it running immediately and keeping it on and letting it do its job. The code is fine.
I wrote this simple program to figure out airflow. On the hour it is supposed to print to a file "hello world", but it's doing it immediately. Does someone see where I am going wrong?
def print_hello():
f = open('helloword.txt','a')
f.write( 'Hello World!')
f.close()
dag = DAG('hello_world', description='Simple tutorial DAG', schedule_interval='@hourly',
start_date=datetime(2018, 5, 31), catchup=False)
hello_operator = PythonOperator(task_id='hello_task', python_callable=print_hello, dag=dag)
Upvotes: 1
Views: 109
Reputation: 13016
The start date is 2018-05-31
and the schedule interval is @hourly
, so the execution date for the first run would normally be 2018-05-31T00:00:00
with a start date >= ~2018-05-31T01:00:00
.
In this case, you have set catchup to false, so instead only the most recent DAG run will be created. I would expect that DAG run created to be 2018-05-31T21:00:00
right now.
The current UTC time is 2018-05-31T22:00:00
right now. Since the start date timestamp 2018-05-31T00:00:00
is in the past, the Airflow scheduler will schedule and start the task immediately.
You can delete the DAG runs and task instances and then change the start date to 2018-06-01
if you want it to start fresh tomorrow. It would not run immediately in this case if you choose a start date in the future.
You can find a bit more info about how the scheduler works here:
Upvotes: 2
Reputation: 1608
Your code looks fine to me. Are you seeing some lines appended to the file if you put your DAG off?
I think what you're seeing is the backfill executions running. You put your start date today, implicitly at midnight. Airflow will therefore catch up and fire up these DAG runs first before eventually running your task every hour.
Upvotes: 1