Reputation: 19628
I have a cluster with 1 namenode and 6 datanodes. After decommissioning 3 of the datanodes. Our YARN service is always bad health. And seems like the nodemanager on one of the datanodes never gets started successfully. Then I tried to restart the nodemanager on that box. And here are the logs.
2014-08-01 11:19:08,217 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system shutdown complete.
2014-08-01 11:19:08,217 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed, Message from ResourceManager: Disallowed NodeManager from box708.datafireball.com, Sending SHUTDOWN signal to the NodeManager.
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:185)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:197)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:352)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:398)
Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed, Message from ResourceManager: Disallowed NodeManager from box708.datafireball.com, Sending SHUTDOWN signal to the NodeManager.
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:255)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:179)
... 6 more
I googled around this error but cannot find the solution, any guidance from anyone?
Upvotes: 0
Views: 3166
Reputation: 1645
buryat is correct. I had this same problem and the fix was to add all the nodes to the include list. But I would like to add this note to anyone running across this issue.
Make sure and add EXACTLY the hostname that yarn is complaining about. In your example ResourceManager: Disallowed NodeManager from box708.datafireball.com
For my case I was adding a node named "gpu-0-5". The "gpu-0-5" hostname was in my yarn.include file and yarn kept complaining. I noticed it said "gpu-0-5.local" (even though gpu-0-5 routes to the same machine). Once I added gpu-0-5.local to my yarn.include list it started working.
I'm not sure how to change the configuration in yarn to only require "gpu-0-5".
Upvotes: 0
Reputation: 384
Message from ResourceManager: Disallowed NodeManager
This message means that either your NodeManager isn't in the allowed list of nodemanagers or it's in the list of excluded.
Check configuration of your resourcemanager for the following properties:
yarn.resourcemanager.nodes.include-path
yarn.resourcemanager.nodes.exclude-path
Upvotes: 2