Reputation: 6686
I am running single node. NameNode
always start to fail on starting cluster. I get follwing error.
2013-06-29 10:37:29,968 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:292)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:200)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:627)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:469)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:403)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:437)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:609)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:594)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1169)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1235)
2013-06-29 10:37:29,971 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2013-06-29 10:37:29,973 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at traw-pc/127.0.0.1
************************************************************/
I know there is same question, And we can resolve by formating NameNode
. But my question is that why every time getting this error? This is not a much concern Since i am running Single Node cluster
. But in real production environment this may cause Data loose. My guess is since i am using /tmp
directory.
Upvotes: 4
Views: 4376
Reputation: 1511
The error is due to a missing or corrupted HDFS directory, as mentioned by Arafath, this is possible due to using the /tmp default directory which is cleared at reboot time.
To fix this, just add or change the property dfs.name.dir
at the etc/hdfs-site.xml
file to something like file:///opt/dfs-data
.
For recovering from this errors, you need to copy from the fsimage backup of the name node ... the time recovery of the backup for big volumes can takes days even weeks, so it's very recommendable to save check points from time to time so the backup recover is faster.
If you are using a local demo Hadoop, you may just format and start again with these commands:
$HADOOP_HOME/sbin/hadoop-daemon.sh start namenode -format
$HADOOP_HOME/sbin/hadoop-daemon.sh start namenode
IMPORTANT: These commands will delete any previous content, obviously try first the backup recovery
For saving regular checkpoints once the name node is healthy again, the commands are:
hdfs dfsadmin -safemode enter
hdfs dfsadmin -saveNamespace
hdfs dfsadmin -safemode leave
The Hadoop cluster needs to be offline for saving checkpoints (checkpointing is rather fast, so a couple of minutes maintenance every month should be more than OK)
Upvotes: 0
Reputation: 11
This can be resolved by specifying your namenode dir to a different location is "hdfs-site.xml" in your Hadoop Configuration . Generally it takes default file://${hadoop.tmp.dir}/dfs/name .. So , after every reboot the /tmp directory is cleared and NameNode data is gone
Upvotes: 1