Mit
Mit

Reputation: 91

Run parallel tasks in Apache Airflow

I am able to configure airflow.cfg file to run tasks one after the other.

What I want to do is, execute tasks in parallel, e.g. 2 at a time and reach the end of list.

How can I configure this?

Upvotes: 9

Views: 19275

Answers (1)

Taylor D. Edmiston
Taylor D. Edmiston

Reputation: 13036

Executing tasks in Airflow in parallel depends on which executor you're using, e.g., SequentialExecutor, LocalExecutor, CeleryExecutor, etc.

For a simple setup, you can achieve parallelism by just setting your executor to LocalExecutor in your airflow.cfg:

[core]
executor = LocalExecutor

Reference: https://github.com/apache/incubator-airflow/blob/29ae02a070132543ac92706d74d9a5dc676053d9/airflow/config_templates/default_airflow.cfg#L76

This will spin up a separate process for each task.

(Of course you'll need to have a DAG with at least 2 tasks that can execute in parallel to see it work.)

Alternatively, with CeleryExecutor, you can spin up any number of workers by just running (as many times as you want):

$ airflow worker

The tasks will go into a Celery queue and each Celery worker will pull off of the queue.

You might find the section Scaling out with Celery in the Airflow Configuration docs helpful.

https://airflow.apache.org/howto/executor/use-celery.html

For any executor, you may want to tweak the core settings that control parallelism once you have that running.

They're all found under [core]. These are the defaults:

# The amount of parallelism as a setting to the executor. This defines
# the max number of task instances that should run simultaneously
# on this airflow installation
parallelism = 32

# The number of task instances allowed to run concurrently by the scheduler
dag_concurrency = 16

# Are DAGs paused by default at creation
dags_are_paused_at_creation = True

# When not using pools, tasks are run in the "default pool",
# whose size is guided by this config element
non_pooled_task_slot_count = 128

# The maximum number of active DAG runs per DAG
max_active_runs_per_dag = 16

Reference: https://github.com/apache/incubator-airflow/blob/29ae02a070132543ac92706d74d9a5dc676053d9/airflow/config_templates/default_airflow.cfg#L99

Upvotes: 12

Related Questions