Anisotropic
Anisotropic

Reputation: 645

Spark not picking up hadoop conf

I have HADOOP_HOME, HADOOP_CONF_DIR, YARN_CONF_DIR all defined in the spark-env.sh script. However, when I try to load a sparksession on yarn with

val sess = new SparkConf().setMaster("yarn-client").setAppName("default")

It times out

23:36:44.219 [run-main-0] DEBUG o.a.h.i.retry.RetryInvocationHandler - Exception while invoking getClusterMetrics of class ApplicationClientProtocolPBClientImpl over null. Retrying after sleeping for 30000ms.
java.net.ConnectException: Call From ip-10-122-2-155/10.122.2.155 to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

There's nothing running locally with port 8032 open so it obviously times out.

my yarn-site.xml explicitly states that the RM address

    <property>
      <name>yarn.resourcemanager.address</name>
      <value>10.122.2.195:8032</value>
    </property>

Upvotes: 0

Views: 1441

Answers (2)

Anisotropic
Anisotropic

Reputation: 645

I fixed this problem by adding the following lines to the build.sbt file.

unmanagedClasspath in Compile += file("/home/ubuntu/hadoop-2.6.0/etc/hadoop")

unmanagedClasspath in Runtime += file("/home/ubuntu/hadoop-2.6.0/etc/hadoop")

With the other environment variables accounted for this allowed the sbt project to pick up the yarn configuration.

Upvotes: 0

Darshan
Darshan

Reputation: 2333

Your driver program is not able to access the variables defined in spark-env.sh. (Assuming you are not running spark-shell)

The possible reason could be, the user running the driver is different than the user of spark(spark installation files).

Try manually setting the variables of spark-env.sh before running your driver as follows

source spark-env.sh 

Upvotes: 1

Related Questions