How to setup two workers in airflow

Question

I have two workers and 3 tasks.

dag = DAG('dummy_for_testing', default_args=default_args)

t1 = BashOperator(
    task_id='print_task1',
    bash_command='task1.py',
    dag=dag)

t2 = BashOperator(
    task_id='print_task2',
    bash_command='task2.py',
    dag=dag)

t3 = BashOperator(
    task_id='print_task3',
    bash_command='task3.py',
    dag=dag)

t1 >> t2 >> t3

Let say, I am performing tasks(t1,t2,t3) on a particular file. Currently, everything is working on one worker but I want to setup another worker that will take the output of first task and perform task t2 and then task t3. So that, queue1 will perform t1 for the next file. How can I make this work for two workers. I am thinking of using queues but couldn't understand how to make queue2 wait until task t1 in queue1 finished.

cwurtz · Accepted Answer

You shouldn't have to do anything other than start both workers, they will pick up tasks as they become available and within the concurrency/parallelism constraints defined in your config.

In the example you gave, the tasks might run entirely one worker 1, worker 2, or a mixture of both. This is because t2 won't start until t1 has completed. In the time between t1 completing and t2 starting, both workers will be idle (assuming you don't have other dags running). One will win the race in reserving the t2 task to run.

If you needed to have specific tasks running on different workers, (say to have one or more workers with higher levels of resources available, or special hardware) you can specify the queue at task level. The queue won't make a difference in the order that tasks run as the Airflow scheduler will ensure a task doesn't run until the tasks upstream to it have been successfully ran.

How to setup two workers in airflow

Answers (1)

Related Questions