Reputation: 41
i have four nodes to run my spark program by set --num-executors 4 , but the problem is that only two is running ,other two computer do not do any computation ,here is : Executor_ID Address ......Total_Task Task_Time Input 1 slave8 88 21.5s 104MB 2 slave6 0 0 0B 3 slave1 88 1min 99.4MB 4 slave2 0 0 0B
how can i make all these four nodes to run my spark program??
Upvotes: 0
Views: 145
Reputation: 13154
I'm guessing that you run on YARN. In that case, you need to set
yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
in the capacity-scheduler.xml file. See Apache Hadoop Yarn - Underutilization of cores. Otherwise YARN will only launch 2 executors no matter what you specify with the --num-executors
flag.
Upvotes: 1
Reputation: 9415
I suspect that in your case, this can be solved by partitioning your data better. Better does not always mean more. It also means at the right time, and in a way that possibly can avoid some of the shuffling.
Upvotes: 0