John
John

Reputation: 353

Resource Manager Has No Nodes

EDIT: I have looked at YARN Resourcemanager not connecting to nodemanager and the solution does not work for me. I have attached the section of the node-manager log where a connection to the resource manager is made:

[main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at /0.0.0.0:8031

2016-06-17 19:01:04,697 INFO  [main] nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:getNMContainerStatuses(429)) - Sending out 0 NM container statuses: []

2016-06-17 19:01:04,701 INFO  [main] nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:registerWithRM(268)) - Registering with RM using containers :[]

2016-06-17 19:01:05,815 INFO  [main] ipc.Client (Client.java:handleConnectionFailure(867)) - Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); retry policy is     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2016-06-17 19:01:06,816 INFO  [main] ipc.Client (Client.java:handleConnectionFailure(867)) - Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

For some reason it says it is connecting to 0.0.0.0. When I ssh into one of the data nodes and ping resource-manager I get a response so it is able to resolve the hostname.

This leads me to believe that an options is incorrect in my yarn-site.xml as my nodes are trying to connect to 0.0.0.0:8031 instead of the resource-manager:8031


I am running a Cloudera hadoop cluster on dockers and am having issues with the Yarn resource manager being able to see the other nodes. They way it is set up is as follows:

Node1 - Namenode (hadoop-hdfs-namenode)

Node 2 - Secondary Namenode (hadoop-hdfs-secondarynamenode)

Node 3 - Yarn Resource-Manager (hadoop-yarn-resourcemanager)

Node 4 - datanode and node manager (hadoop-hdfs-datanode, hadoop-yarn-nodemanager)

Node 5 - datanode and node manager (hadoop-hdfs-datanode, hadoop-yarn-nodemanager)

When I go to namenode:50070 I am able to see both nodes. However, when I go to the resource-manager:8088 it shows I have zero nodes. My yarn-site.xml file which is on every node is as follows:

<configuration>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>resource-manager:8032</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>resource-manager:8030</value>
  </property>
  <property>
    <description>Classpath for typical applications.</description>
    <name>yarn.application.classpath</name>
    <value>
      $HADOOP_CONF_DIR,
      $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
      $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
      $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
      $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
    </value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>file:///data/1/yarn/local,file:///data/2/yarn/local,file:///data/3/yarn/local</value>
  </property>
  <property>
    <name>yarn.nodemanager.log-dirs</name>
    <value>file:///data/1/yarn/logs,file:///data/2/yarn/logs,file:///data/3/yarn/logs</value>
  </property>
  <property>
    <name>yarn.log.aggregation-enable</name>
    <value>true</value>
  </property>
  <property>
    <description>Where to aggregate logs</description>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>hdfs://namenode:8020/var/log/hadoop-yarn/apps</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>resource-manager:8088</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>resource-manager:8031</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>resource-manager:8033</value>
  </property>
  <property>
    <description>
    Number of seconds after an application finishes before the nodemanager's
    DeletionService will delete the application's localized file directory
    and log directory.

    To diagnose Yarn application problems, set this property's value large
    enough (for example, to 600 = 10 minutes) to permit examination of these
    directories. After changing the property's value, you must restart the
    nodemanager in order for it to have an effect.

    The roots of Yarn applications' work directories is configurable with
    the yarn.nodemanager.local-dirs property (see below), and the roots
    of the Yarn applications' log directories is configurable with the
    yarn.nodemanager.log-dirs property (see also below).
    </description>
    <name>yarn.nodemanager.delete.debug-delay-sec</name>
    <value>600</value>
  </property>
</configuration>

Does anyone have any ideas as to why this is the case?

Thanks for reading.

Upvotes: 2

Views: 2039

Answers (2)

John
John

Reputation: 353

As indicated in the edit it appeared as if the yarn-site.xml was not being picked up and only defaults were happening. I solved this be copying the yarn-site.xml file into every directory on the machine as user root. I then ran the node-manager as to make it error reading the file as it does not run under user root. The log directed me to where it expected the file which was in a yarn specific directory instead of the general hadoop directory.

Upvotes: 1

afuc func
afuc func

Reputation: 972

Specify:

<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>master-1</value>
</property>

Upvotes: 1

Related Questions