Vvan_G
Vvan_G

Reputation: 215

Ensure sequential running of tasks (Apache Airflow)

Under a sequential executor, I have a DAG file where I specify three tasks that are needed to be run sequentially (t1-->t2-->t3):

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2017, 6, 14, 23 , 20),
    'email_on_failure': False,
    'email_on_retry': False,
    }

dag = DAG('test_dag', default_args=default_args, schedule_interval="*/5 * * * *")

t1 = BashOperator(
    task_id='form_dataset',
    bash_command='python 1.py',
    dag=dag)

t2 = BashOperator(
    task_id='form_features',
    bash_command='python 2.py',
    depends_on_past=True,
    dag=dag)

t3 = BashOperator(
    task_id='train',
    bash_command='python 3.py',
    depends_on_past=True,
    dag=dag)

t2.set_upstream(t1)
t3.set_upstream(t2)
t4.set_upstream(t3)

I assume the sequential behavior t1-->t2-->t3 to be a default one, thought it's not the case in my situation (the order is pretty much random, e.g. t1-->t2-->t2-->t1-->t3). What kind of argument I am missing that would correct the behavior?

Upvotes: 1

Views: 9632

Answers (1)

Him
Him

Reputation: 1649

You need to add the statement

t1 >> t2 >> t3

at the end of the file. More details for this are on the following link: https://airflow.incubator.apache.org/concepts.html#bitshift-composition

For completeness, you can also do it by using set_upstream() or set_downstream() methods for tasks.

Upvotes: 3

Related Questions