Citysto
Citysto

Reputation: 31

Airflow - How to have a worker take all dag run tasks?

I am currently working with Airflow and Celery for processing files. A worker needs to download files, process them and re-upload them after. My DAGs are fine with only one worker. But when I add one things get complicated.

Workers takes tasks as they are available. Worker1 can take the task "processing downloaded files" but that was Worker2 that took the task "downloading files", so the task failed, because it can't process files that don't exist.

Is there a way to specify to the workers (or the scheduler) that a DAG must be run only on one worker? I know about queue. But I am already using them.

Upvotes: 3

Views: 2041

Answers (1)

z1k
z1k

Reputation: 121

In this case, you can have an Airflow Variable to save all your worker nodes name. For ex.:

  • Variable: worker_list
  • Value: boxA, boxB, boxC

When run the Airflow worker, you can specify multiple job queues. For ex.: airflow worker job_queue1,job_queue2 For your case, I'll run airflow worker af_<hostname>

In your DAG code, just need to get that worker_list Airflow variable, select a box randomly, then queue all your jobs to af_<random_selected_box> queue

Upvotes: 2

Related Questions