zhengjw
zhengjw

Reputation: 41

how to make executors run spark program by using --num-executors?

i have four nodes to run my spark program by set --num-executors 4 , but the problem is that only two is running ,other two computer do not do any computation ,here is : Executor_ID Address ......Total_Task Task_Time Input 1 slave8 88 21.5s 104MB 2 slave6 0 0 0B 3 slave1 88 1min 99.4MB 4 slave2 0 0 0B

how can i make all these four nodes to run my spark program??

Upvotes: 0

Views: 145

Answers (2)

Glennie Helles Sindholt
Glennie Helles Sindholt

Reputation: 13154

I'm guessing that you run on YARN. In that case, you need to set

yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator 

in the capacity-scheduler.xml file. See Apache Hadoop Yarn - Underutilization of cores. Otherwise YARN will only launch 2 executors no matter what you specify with the --num-executors flag.

Upvotes: 1

YoYo
YoYo

Reputation: 9415

  1. Executors run tasks. In spark, tasks are predetermined by the data partitioning. If you have 2 partitions, but only 4 executors, only 2 executors will potentially have work to do.
  2. In a standalone cluster, nodes need to have workers started for executors to run.
  3. You associate CPU and memory with an executor. If nodes cannot get the requested resources, it is going to be queued up waiting for those resources to come available.
  4. If two nodes by themselves have enough CPU cores to do all he work, then the other ones will not be put to work. Locality of the data is important, so if possible all tasks will be scheduled on one node.

I suspect that in your case, this can be solved by partitioning your data better. Better does not always mean more. It also means at the right time, and in a way that possibly can avoid some of the shuffling.

Upvotes: 0

Related Questions