Reputation: 91
I am able to configure airflow.cfg
file to run tasks one after the other.
What I want to do is, execute tasks in parallel, e.g. 2 at a time and reach the end of list.
How can I configure this?
Upvotes: 9
Views: 19275
Reputation: 13036
Executing tasks in Airflow in parallel depends on which executor you're using, e.g., SequentialExecutor
, LocalExecutor
, CeleryExecutor
, etc.
For a simple setup, you can achieve parallelism by just setting your executor to LocalExecutor
in your airflow.cfg:
[core]
executor = LocalExecutor
This will spin up a separate process for each task.
(Of course you'll need to have a DAG with at least 2 tasks that can execute in parallel to see it work.)
Alternatively, with CeleryExecutor
, you can spin up any number of workers by just running (as many times as you want):
$ airflow worker
The tasks will go into a Celery queue and each Celery worker will pull off of the queue.
You might find the section Scaling out with Celery in the Airflow Configuration docs helpful.
https://airflow.apache.org/howto/executor/use-celery.html
For any executor, you may want to tweak the core settings that control parallelism once you have that running.
They're all found under [core]
. These are the defaults:
# The amount of parallelism as a setting to the executor. This defines
# the max number of task instances that should run simultaneously
# on this airflow installation
parallelism = 32
# The number of task instances allowed to run concurrently by the scheduler
dag_concurrency = 16
# Are DAGs paused by default at creation
dags_are_paused_at_creation = True
# When not using pools, tasks are run in the "default pool",
# whose size is guided by this config element
non_pooled_task_slot_count = 128
# The maximum number of active DAG runs per DAG
max_active_runs_per_dag = 16
Upvotes: 12