Jill Clover
Jill Clover

Reputation: 2328

How to handle Spark Executors when number of partitions do not match no of Executors?

Let's say I have 3 executors and 4 partitions, and we assume theses number cannot be changed.

This is not an efficient setup, because we have to read 2 passes: in 1 pass, we read 3 partitions; and in the second partition, we read 1 partition.

Is there a way in Spark that we can improve the efficiency without changing the number of executors and partitions?

Upvotes: 2

Views: 1035

Answers (1)

Chandan Ray
Chandan Ray

Reputation: 2091

In your scenario you need to update the number of cores.

In spark each partition is taken up for execution by one task of spark. As you have 3 executors and 4 partitions and if you assume you have total 3 cores I.e one core per executor then 3 partition of data will be run in parallel and one partition will be taken once one core for the executor will be free. To handle this latency we need to increase spark.executor.cores=2. I.e each executor can run 2 threads at a time I.e 2 tasks at a time.

So all your partitions will be executed in parallel but it does not guarantee whether 1 executor will run 2 tasks and 2 executors will run one task each or 2 executors will run 2 tasks on 2 individual partitions with one executor will be idle.

Upvotes: 0

Related Questions