Reputation: 645
I have HADOOP_HOME, HADOOP_CONF_DIR, YARN_CONF_DIR all defined in the spark-env.sh script. However, when I try to load a sparksession on yarn with
val sess = new SparkConf().setMaster("yarn-client").setAppName("default")
It times out
23:36:44.219 [run-main-0] DEBUG o.a.h.i.retry.RetryInvocationHandler - Exception while invoking getClusterMetrics of class ApplicationClientProtocolPBClientImpl over null. Retrying after sleeping for 30000ms.
java.net.ConnectException: Call From ip-10-122-2-155/10.122.2.155 to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
There's nothing running locally with port 8032 open so it obviously times out.
my yarn-site.xml explicitly states that the RM address
<property>
<name>yarn.resourcemanager.address</name>
<value>10.122.2.195:8032</value>
</property>
Upvotes: 0
Views: 1441
Reputation: 645
I fixed this problem by adding the following lines to the build.sbt
file.
unmanagedClasspath in Compile += file("/home/ubuntu/hadoop-2.6.0/etc/hadoop")
unmanagedClasspath in Runtime += file("/home/ubuntu/hadoop-2.6.0/etc/hadoop")
With the other environment variables accounted for this allowed the sbt project to pick up the yarn configuration.
Upvotes: 0
Reputation: 2333
Your driver program is not able to access the variables defined in spark-env.sh. (Assuming you are not running spark-shell)
The possible reason could be, the user running the driver is different than the user of spark(spark installation files).
Try manually setting the variables of spark-env.sh before running your driver as follows
source spark-env.sh
Upvotes: 1