Reputation: 22671
Running Spark on YARN, cluster mode.
I am submitting Spark program like this:
spark-submit \
--class com.blablacar.insights.etl.SparkETL \
--name ${JOB_NAME} \
--master yarn \
--num-executors 1 \
--deploy-mode cluster \
--driver-memory 512m \
--driver-cores 1 \
--executor-memory 2g \
--executor-cores 20 \
toto.jar json
I can see 2 jobs are running fine on 2 nodes. But I can see also 2 other job with just a driver container !
Is it possible to not run driver if there no resource for worker?
Upvotes: 4
Views: 654
Reputation: 22671
Actually, there is a setting to limit resources to "Application Master" (in case of Spark, this is the driver):
yarn.scheduler.capacity.maximum-am-resource-percent
From http://maprdocs.mapr.com/home/AdministratorGuide/Hadoop2.xCapacityScheduler-RunningPendingApps.html:
Maximum percent of resources in the cluster that can be used to run application masters - controls the number of concurrent active applications.
This way, YARN will not take full resources for Spark drivers, and keep resources for workers. Youpi !
Upvotes: 3