James Liu
James Liu

Reputation: 206

cluster-mode SPARK refuses to run more than two jobs concurrently

My Spark cluster refuses to run more than two jobs simultaneously. One of the three will invariable stay stuck in 'ACCEPTED' state.

Hardware

4 Data Node with spark clients, 24gb ram, 4processors

Cluster Metrics show there should be enough cores

Apps Submitted    3
Apps Pending    1
Apps Running    2
Apps Completed    0
Containers Running   4
Memory Used    8GB
Memory Total  32GB
Memory Reserved  0B
VCores Used    4
VCores Total    8
VCores Reserved    0
Active Nodes    2
Decommissioned Nodes    0
Lost Nodes    0
Unhealthy Nodes   0
Rebooted Nodes    0

On Application Manager you can see the final the only way to run the third app is to kill a running one

application_1504018580976_0002 adm com.x.app1 SPARK default 0 [date] N/A RUNNING UNDEFINED 2 2 5120 25.0 25.0 
application_1500031233020_0090 adm com.x.app2 SPARK default 0 [date] N/A RUNNING UNDEFINED 2 2 3072 25.0 25.0 
application_1504024737012_0001 adm com.x.app3 SPARK default 0 [date] N/A ACCEPTED UNDEFINED 0 0 0 0.0 0.0

The running apps have 2x containers and 2x allocated vcores, 25% of the queue and 25% of the cluster.

Deployment command for all 3 apps.

/usr/hdp/current/spark2-client/bin/spark-submit 
--master yarn 
--deploy-mode cluster 
--driver-cores 1 
--driver-memory 512m 
--num-executors 1 
--executor-cores 1 
--executor-memory 1G 
--class com..x.appx ../lib/foo.jar

Capacity Scheduler

yarn.scheduler.capacity.default.minimum-user-limit-percent = 100
yarn.scheduler.capacity.maximum-am-resource-percent = 0.2
yarn.scheduler.capacity.maximum-applications = 10000
yarn.scheduler.capacity.node-locality-delay = 40
yarn.scheduler.capacity.root.accessible-node-labels = *
yarn.scheduler.capacity.root.acl_administer_queue = *
yarn.scheduler.capacity.root.capacity = 100
yarn.scheduler.capacity.root.default.acl_administer_jobs = *
yarn.scheduler.capacity.root.default.acl_submit_applications = *
yarn.scheduler.capacity.root.default.capacity = 100
yarn.scheduler.capacity.root.default.maximum-capacity = 100
yarn.scheduler.capacity.root.default.state = RUNNING
yarn.scheduler.capacity.root.default.user-limit-factor = 1
yarn.scheduler.capacity.root.queues = default

Upvotes: 1

Views: 1198

Answers (2)

tk421
tk421

Reputation: 5947

Your setting:

yarn.scheduler.capacity.maximum-am-resource-percent = 0.2

Implies:

total vcores(8) x maximum-am-resource-percent(0.2) = 1.6

1.6 gets rounded up to 2 since partial vcores makes no sense. This means you can only have 2 application masters at a time which is why you can only run 2 jobs at a time.

Solution, bump up yarn.scheduler.capacity.maximum-am-resource-percent to a higher value like 0.5.

Upvotes: 1

vaquar khan
vaquar khan

Reputation: 11449

followings are parameters to control parallel execution are:

spark.executor.instances -> number of executors

spark.executor.cores -> number of cores per executors

spark.task.cpus -> number of tasks per cpu

https://spark.apache.org/docs/latest/submitting-applications.html

Upvotes: 0

Related Questions