Flavio
Flavio

Reputation: 839

CeleryExecutor in Airflow are not parallelizing tasks in a subdag

We're using Airflow:1.10.0 and after some analysis why some of our ETL processes are taking so long we saw that the subdags are using a SequentialExecutor instead to use BaseExecutor or when we configure the CeleryExecutor.

I would like to know if this is a bug or an expected behavior of Airflow. Doesn't make any sense have some capability to execute tasks in parallel but in some specific kind of task, this capability is lost.

Execution of our SugDag (Zoom in Subdag)

Upvotes: 6

Views: 2846

Answers (2)

strider
strider

Reputation: 702

Maybe a little bit late but implementing LocalExecutor works for me.

from airflow.executors.local_executor import LocalExecutor

subdag = SubDagOperator(
  task_id=task_id,
  default_args=default_args,
  executor= LocalExecutor(),
  dag=dag
)

Upvotes: 3

Charlie Gelman
Charlie Gelman

Reputation: 436

It is a typical pattern to use the SequentialExecutor in subdags with the idea that you often are executing a lot of similar related tasks and don't necessarily want the added overhead of going through adding to queues in celery, etc. See the "other tips" section in the Airflow docs for subdags: https://airflow.apache.org/concepts.html#subdags

By default subdags are set to use the Sequential Executor (see: https://github.com/apache/incubator-airflow/blob/v1-10-stable/airflow/operators/subdag_operator.py#L38) but you can change that.

To use the celery executor, add in the following in your subdag creation:

from airflow.executors.celery_executor import CeleryExecutor
mysubdag = SubDagOperator(
    executor=CeleryExecutor()
    ...
)

Upvotes: 10

Related Questions