Reputation: 2328
Let's say I have 3 executors and 4 partitions, and we assume theses number cannot be changed.
This is not an efficient setup, because we have to read 2 passes: in 1 pass, we read 3 partitions; and in the second partition, we read 1 partition.
Is there a way in Spark that we can improve the efficiency without changing the number of executors and partitions?
Upvotes: 2
Views: 1035
Reputation: 2091
In your scenario you need to update the number of cores.
In spark each partition is taken up for execution by one task of spark. As you have 3 executors and 4 partitions and if you assume you have total 3 cores I.e one core per executor then 3 partition of data will be run in parallel and one partition will be taken once one core for the executor will be free. To handle this latency we need to increase spark.executor.cores=2. I.e each executor can run 2 threads at a time I.e 2 tasks at a time.
So all your partitions will be executed in parallel but it does not guarantee whether 1 executor will run 2 tasks and 2 executors will run one task each or 2 executors will run 2 tasks on 2 individual partitions with one executor will be idle.
Upvotes: 0