B.Mr.W.
B.Mr.W.

Reputation: 19628

Cannot Start A Certain Node Manager After Decommissioning Some Nodes

I have a cluster with 1 namenode and 6 datanodes. After decommissioning 3 of the datanodes. Our YARN service is always bad health. And seems like the nodemanager on one of the datanodes never gets started successfully. Then I tried to restart the nodemanager on that box. And here are the logs.

2014-08-01 11:19:08,217 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system shutdown complete.
2014-08-01 11:19:08,217 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed, Message from ResourceManager: Disallowed NodeManager from  box708.datafireball.com, Sending SHUTDOWN signal to the NodeManager.
    at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:185)
    at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
    at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:197)
    at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:352)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:398)
Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed, Message from ResourceManager: Disallowed NodeManager from  box708.datafireball.com, Sending SHUTDOWN signal to the NodeManager.
    at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:255)
    at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:179)
    ... 6 more

I googled around this error but cannot find the solution, any guidance from anyone?

Upvotes: 0

Views: 3166

Answers (2)

Kevin Vasko
Kevin Vasko

Reputation: 1645

buryat is correct. I had this same problem and the fix was to add all the nodes to the include list. But I would like to add this note to anyone running across this issue.

Make sure and add EXACTLY the hostname that yarn is complaining about. In your example ResourceManager: Disallowed NodeManager from box708.datafireball.com

For my case I was adding a node named "gpu-0-5". The "gpu-0-5" hostname was in my yarn.include file and yarn kept complaining. I noticed it said "gpu-0-5.local" (even though gpu-0-5 routes to the same machine). Once I added gpu-0-5.local to my yarn.include list it started working.

I'm not sure how to change the configuration in yarn to only require "gpu-0-5".

Upvotes: 0

buryat
buryat

Reputation: 384

Message from ResourceManager: Disallowed NodeManager

This message means that either your NodeManager isn't in the allowed list of nodemanagers or it's in the list of excluded.

Check configuration of your resourcemanager for the following properties:

yarn.resourcemanager.nodes.include-path

yarn.resourcemanager.nodes.exclude-path

Upvotes: 2

Related Questions