tony marbo
tony marbo

Reputation: 251

HADOOP datanode strange things

I think I must have some misunderstanding about the datanodes in Hadoop Cluster. I have a hadoop virtural cluster composed of master,slave1,slave2, slave3. Master and slave1 are in a phsical machine while slave2 and slave3 are in one physical machine. When I start the cluster, in the HDFS webUI, I can only see three living datanodes, slave1,master, slave2. But sometimes, the three living datanodes are master, slave1,slave3. That's strange. I ssh to the unstarted data node, though I execute jps and found no datanode, I can still copy and delete files on HDFS on this node. So I believe I must not understand datanode correctly. I have three questions here. 1 Is there one datanode per node? 2 Why the node which is not datanode can still read and write on HDFS? 3 can we decide the number of datanode?

Here is the log of the unstarted datanode:


STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = slave11/192.168.111.31
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 1.0.3
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-    1.0 -r 1335192; compiled by 'hortonfo' on Tue May  8 20:31:25 UTC 2012
************************************************************/
2012-08-03 17:47:07,578 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded     properties from hadoop-metrics2.properties
2012-08-03 17:47:07,595 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean     for source MetricsSystem,sub=Stats registered.
2012-08-03 17:47:07,596 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2012-08-03 17:47:07,596 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2012-08-03 17:47:07,911 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2012-08-03 17:47:07,915 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2012-08-03 17:47:09,457 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.111.21:54310. Already tried 0 time(s).
2012-08-03 17:47:10,460 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.111.21:54310. Already tried 1 time(s).
2012-08-03 17:47:11,464 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.111.21:54310. Already tried 2 time(s).
2012-08-03 17:47:19,565 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Registered FSDatasetStatusMBean
2012-08-03 17:47:19,601 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 50010
2012-08-03 17:47:19,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 1048576 bytes/s
2012-08-03 17:47:24,721 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2012-08-03 17:47:24,854 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2012-08-03 17:47:24,952 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dfs.webhdfs.enabled = false
2012-08-03 17:47:24,953 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50075
2012-08-03 17:47:24,953 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50075 webServer.getConnectors()[0].getLocalPort() returned 50075
2012-08-03 17:47:24,953 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50075
2012-08-03 17:47:24,953 INFO org.mortbay.log: jetty-6.1.26
2012-08-03 17:47:25,665 INFO org.mortbay.log: Started [email protected]:50075

2012-08-03 17:47:25,688 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered. 2012-08-03 17:47:25,690 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source DataNode registered. 2012-08-03 17:47:30,717 INFO org.apache.hadoop.ipc.Server: Starting SocketReader 2012-08-03 17:47:30,718 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcDetailedActivityForPort50020 registered. 2012-08-03 17:47:30,718 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcActivityForPort50020 registered. 2012-08-03 17:47:30,721 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration = DatanodeRegistration(slave11:50010, storageID=DS-1062340636-127.0.0.1-50010-1339803955209, infoPort=50075, ipcPort=50020) 2012-08-03 17:47:30,764 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting asynchronous block report scan 2012-08-03 17:47:30,766 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.111.31:50010, storageID=DS-1062340636-127.0.0.1-50010-1339803955209, infoPort=50075, ipcPort=50020)In DataNode.run, data = FSDataset{dirpath='/app/hadoop/tmp/dfs/data/current'} 2012-08-03 17:47:30,774 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: using BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec 2012-08-03 17:47:30,778 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: starting 2012-08-03 17:47:30,772 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2012-08-03 17:47:30,773 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting 2012-08-03 17:47:30,773 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: starting 2012-08-03 17:47:30,773 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: starting 2012-08-03 17:47:30,795 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic block scanner. 2012-08-03 17:47:30,816 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Finished asynchronous block report scan in 52ms 2012-08-03 17:47:30,838 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Generated rough (lockless) block report in 32 ms 2012-08-03 17:47:30,840 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Reconciled asynchronous block report against current state in 2 ms 2012-08-03 17:47:31,158 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-6072482390929551157_78209 2012-08-03 17:47:33,775 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Reconciled asynchronous block report against current state in 1 ms 2012-08-03 17:47:33,793 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is shutting down: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node 192.168.111.31:50010 is attempting to report storage ID DS-1062340636-127.0.0.1-50010-1339803955209. Node 192.168.111.32:50010 is expected to serve this storage. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDatanode(FSNamesystem.java:4608) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processReport(FSNamesystem.java:3460) at org.apache.hadoop.hdfs.server.namenode.NameNode.blockReport(NameNode.java:1001) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

    at org.apache.hadoop.ipc.Client.call(Client.java:1070)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
    at $Proxy5.blockReport(Unknown Source)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:958)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1458)
    at java.lang.Thread.run(Thread.java:636)

2012-08-03 17:47:33,873 INFO org.mortbay.log: Stopped [email protected]:50075 2012-08-03 17:47:33,980 INFO org.apache.hadoop.ipc.Server: Stopping server on 50020 2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: exiting 2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: exiting 2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: exiting

2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.metrics.RpcInstrumentation: shut down
2012-08-03 17:47:33,982 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.111.31:50010, storageID=DS-1062340636-127.0.0.1-50010-1339803955209, infoPort=50075, ipcPort=50020):DataXceiveServer:java.nio.channels.AsynchronousCloseException
    at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:170)
    at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:102)
    at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:131)
    at java.lang.Thread.run(Thread.java:636)

2012-08-03 17:47:33,982 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 50020
2012-08-03 17:47:33,982 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting DataXceiveServer
2012-08-03 17:47:33,983 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder

2012-08-03 17:47:33,982 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for threadgroup to exit, active threads is 1 2012-08-03 17:47:33,984 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Exiting DataBlockScanner thread. 2012-08-03 17:47:33,985 INFO org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: Shutting down all async disk service threads... 2012-08-03 17:47:33,985 INFO org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: All async disk service threads have been shut down. 2012-08-03 17:47:33,985 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.111.31:50010, storageID=DS-1062340636-127.0.0.1-50010-1339803955209, infoPort=50075, ipcPort=50020):Finishing DataNode in: FSDataset{dirpath='/app/hadoop/tmp/dfs/data/current'} 2012-08-03 17:47:33,987 WARN org.apache.hadoop.metrics2.util.MBeans: Hadoop:service=DataNode,name=DataNodeInfo javax.management.InstanceNotFoundException: Hadoop:service=DataNode,name=DataNodeInfo at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1118) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:433) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:421) at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:540) at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:71) at org.apache.hadoop.hdfs.server.datanode.DataNode.unRegisterMXBean(DataNode.java:522) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:737) at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1471) at java.lang.Thread.run(Thread.java:636) 2012-08-03 17:47:33,988 INFO org.apache.hadoop.ipc.Server: Stopping server on 50020 2012-08-03 17:47:33,988 INFO org.apache.hadoop.ipc.metrics.RpcInstrumentation: shut down 2012-08-03 17:47:33,988 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for threadgroup to exit, active threads is 0 2012-08-03 17:47:33,988 WARN org.apache.hadoop.metrics2.util.MBeans: Hadoop:service=DataNode,name=FSDatasetState-DS-1062340636-127.0.0.1-50010-1339803955209 javax.management.InstanceNotFoundException: Hadoop:service=DataNode,name=FSDatasetState-DS-1062340636-127.0.0.1-50010-1339803955209 at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1118) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:433) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:421)

at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:540) at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:71) at org.apache.hadoop.hdfs.server.datanode.FSDataset.shutdown(FSDataset.java:2067) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:799) at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1471) at java.lang.Thread.run(Thread.java:636)

2012-08-03 17:47:33,988 WARN org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: AsyncDiskService has already shut down. 2012-08-03 17:47:33,989 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode

Upvotes: 1

Views: 4047

Answers (1)

Donald Miner
Donald Miner

Reputation: 39893

There are problems having several DataNodes per single hostname. You say it is virtual, so are they on different virtual machines? If so, this shouldn't be a problem...

I would check the DataNode logs for slave2 and slave3 and see why one isn't booting. The error message will be printed there. If the error says something along the lines of the port being taken or something like that.


You don't need to be on a DataNode to access HDFS. The HDFS client (such as hadoop fs -put) directly communicates with the NameNode and other DataNode processes without ever having to access the local one.

It is actually quite common on large clusters to have a separate "query node" that has access to HDFS and MapReduce, but isn't running any DataNode or TaskTracker services.

As long as you have the Hadoop packages installed and the configuration files point to the NameNode and JobTracker correctly, you can access your cluster "remotely".

Upvotes: 3

Related Questions