Reputation: 206
My Spark cluster refuses to run more than two jobs simultaneously. One of the three will invariable stay stuck in 'ACCEPTED' state.
4 Data Node with spark clients, 24gb ram, 4processors
Apps Submitted 3
Apps Pending 1
Apps Running 2
Apps Completed 0
Containers Running 4
Memory Used 8GB
Memory Total 32GB
Memory Reserved 0B
VCores Used 4
VCores Total 8
VCores Reserved 0
Active Nodes 2
Decommissioned Nodes 0
Lost Nodes 0
Unhealthy Nodes 0
Rebooted Nodes 0
application_1504018580976_0002 adm com.x.app1 SPARK default 0 [date] N/A RUNNING UNDEFINED 2 2 5120 25.0 25.0
application_1500031233020_0090 adm com.x.app2 SPARK default 0 [date] N/A RUNNING UNDEFINED 2 2 3072 25.0 25.0
application_1504024737012_0001 adm com.x.app3 SPARK default 0 [date] N/A ACCEPTED UNDEFINED 0 0 0 0.0 0.0
The running apps have 2x containers and 2x allocated vcores, 25% of the queue and 25% of the cluster.
/usr/hdp/current/spark2-client/bin/spark-submit
--master yarn
--deploy-mode cluster
--driver-cores 1
--driver-memory 512m
--num-executors 1
--executor-cores 1
--executor-memory 1G
--class com..x.appx ../lib/foo.jar
yarn.scheduler.capacity.default.minimum-user-limit-percent = 100
yarn.scheduler.capacity.maximum-am-resource-percent = 0.2
yarn.scheduler.capacity.maximum-applications = 10000
yarn.scheduler.capacity.node-locality-delay = 40
yarn.scheduler.capacity.root.accessible-node-labels = *
yarn.scheduler.capacity.root.acl_administer_queue = *
yarn.scheduler.capacity.root.capacity = 100
yarn.scheduler.capacity.root.default.acl_administer_jobs = *
yarn.scheduler.capacity.root.default.acl_submit_applications = *
yarn.scheduler.capacity.root.default.capacity = 100
yarn.scheduler.capacity.root.default.maximum-capacity = 100
yarn.scheduler.capacity.root.default.state = RUNNING
yarn.scheduler.capacity.root.default.user-limit-factor = 1
yarn.scheduler.capacity.root.queues = default
Upvotes: 1
Views: 1198
Reputation: 5947
Your setting:
yarn.scheduler.capacity.maximum-am-resource-percent = 0.2
Implies:
total vcores(8) x maximum-am-resource-percent(0.2) = 1.6
1.6 gets rounded up to 2 since partial vcores makes no sense. This means you can only have 2 application masters at a time which is why you can only run 2 jobs at a time.
Solution, bump up yarn.scheduler.capacity.maximum-am-resource-percent
to a higher value like 0.5.
Upvotes: 1
Reputation: 11449
followings are parameters to control parallel execution are:
spark.executor.instances -> number of executors
spark.executor.cores -> number of cores per executors
spark.task.cpus -> number of tasks per cpu
https://spark.apache.org/docs/latest/submitting-applications.html
Upvotes: 0