Reputation: 103
I trying to configure Yarn and Spark for my 4 node cluster.
Every node has the following specs:
I configured Yarn and Spark so far that Spark can execute the SparkPi example calculation, but this works only under the following configuration of the yarn-site.xml:
<configuration>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>ds11</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>20480</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>20480</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1536</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name>
<value>3600</value>
</property>
And under the following spark-defaults.conf:
spark.master yarn
spark.eventLog.enabled true
spark.eventLog.dir hdfs://ds11:9000/spark-logs
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory 2048m
spark.executor.memory 1024m
spark.yarn.am.memory 1024m
spark.executor.instances 16
spark.executor.cores 4
spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
spark.history.fs.logDirectory hdfs://ds11:9000/spark-logs
spark.history.fs.update.interval 10s
spark.history.ui.port 18080
The critical points are:
yarn.scheduler.minimum-allocation-mb
and
spark.executor.memory
If I set the yarn.scheduler.minimum-allocation-mb to just 1537mb or higher, then Spark can't allocate containers for the Spark Jobs.
So, when I start Spark I get the following Diagnostics:
2018-03-01 13:12:25,295 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
2018-03-01 13:12:25,296 INFO yarn.Client: Setting up container launch context for our AM
2018-03-01 13:12:25,299 INFO yarn.Client: Setting up the launch environment for our AM container
2018-03-01 13:12:25,306 INFO yarn.Client: Preparing resources for our AM container
2018-03-01 13:12:26,722 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2018-03-01 13:12:29,899 INFO yarn.Client: Uploading resource file:/tmp/spark-19cf3747-6949-4117-ba92-ccde71d8b473/__spark_libs__7526053733120768643.zip -> hdfs://ds11:9000/user/nw/.sparkStaging/application_1519906323717_0001/__spark_libs__7526053733120768643.zip
2018-03-01 13:12:32,082 INFO yarn.Client: Uploading resource file:/tmp/spark-19cf3747-6949-4117-ba92-ccde71d8b473/__spark_conf__171844339516087904.zip -> hdfs://ds11:9000/user/nw/.sparkStaging/application_1519906323717_0001/__spark_conf__.zip
2018-03-01 13:12:32,167 INFO spark.SecurityManager: Changing view acls to: nw
2018-03-01 13:12:32,167 INFO spark.SecurityManager: Changing modify acls to: nw
2018-03-01 13:12:32,167 INFO spark.SecurityManager: Changing view acls groups to:
2018-03-01 13:12:32,167 INFO spark.SecurityManager: Changing modify acls groups to:
2018-03-01 13:12:32,167 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(nw); groups with view permissions: Set(); users with modify permissions: Set(nw); groups with modify permissions: Set()
2018-03-01 13:12:32,175 INFO yarn.Client: Submitting application application_1519906323717_0001 to ResourceManager
2018-03-01 13:12:32,761 INFO impl.YarnClientImpl: Submitted application application_1519906323717_0001
2018-03-01 13:12:32,766 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1519906323717_0001 and attemptId None
2018-03-01 13:12:33,779 INFO yarn.Client: Application report for application_1519906323717_0001 (state: ACCEPTED)
2018-03-01 13:12:33,785 INFO yarn.Client:
client token: N/A
diagnostics: [Thu Mar 01 13:12:32 +0100 2018] Application is added to the scheduler and is not yet activated. Skipping AM assignment as cluster resource is empty. Details : AM Partition = <DEFAULT_PARTITION>; AM Resource Request = <memory:1537, vCores:1>; Queue Resource Limit for AM = <memory:0, vCores:0>; User AM Resource Limit of the queue = <memory:0, vCores:0>; Queue AM Resource Usage = <memory:0, vCores:0>;
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1519906352464
final status: UNDEFINED
tracking URL: http://ds11:8088/proxy/application_1519906323717_0001/
user: nw
2018-03-01 13:12:34,789 INFO yarn.Client: Application report for application_1519906323717_0001 (state: ACCEPTED)
2018-03-01 13:12:35,794 INFO yarn.Client: Application report for application_1519906323717_0001 (state: ACCEPTED)
When I have the yarn.scheduler.minimum-allocation-mb on 1536mb and increase the spark.executor.memory to e.g 2048mb, I get the following Error:
2018-03-01 15:15:47,578 ERROR spark.SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: Required executor memory (2048+384 MB) is above the max threshold (1536 MB) of this cluster! Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'.
at org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:319)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:167)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:173)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2516)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:918)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:910)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:910)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
When I increase both parameter I get still the first errortype, that Spark can't allocate container.
Maybe someone has an idea for this problem?
Upvotes: 1
Views: 998
Reputation: 191681
It sounds like you are only editing the yarn-site
on the Spark client only.
If you want to change the actual YARN ResourceManager and NodeManager memory sizes, then you'll need to rsync
that file across the whole cluster, then reboot the YARN services.
P.S. Setup HA ResourceManager if you don't have it already
Upvotes: 1