Abbas Gadhia
Abbas Gadhia

Reputation: 15090

Namenode format does not free up datanode disk space

After shutting down the cluster ./stop-all.sh, and then invoking a hadoop namenode -format, I see that the datanodes have the same disk space i.e. the space has not been freed up.

Why is that?

Upvotes: 4

Views: 11513

Answers (3)

Abbas Gadhia
Abbas Gadhia

Reputation: 15090

On formatting the namenode, the space does not get cleaned up. One will have to do so manually.

To do that,

First stop the cluster by invoking ./stop-all.sh or ./stop-mapred.sh and ./stop-dfs.sh in the correct order.

Then delete the data directory of the datanode, i.e either the directory specified by dfs.data.dir in hdfs-site.xml or by hadoop.tmp.dir/dfs/data

The option to do a -rmr (specified in one of the other answers to this question) before doing a format is actually the best way, unless you're like me who unknowingly formatted the namenode and THEN realized that the datanode space doesn't get cleaned up ;)

Upvotes: 3

user2486495
user2486495

Reputation: 1729

You can delete manually data on DataNode before formatting NameNode

rmr

Usage: hadoop fs -rmr URI [URI …]

Recursive version of delete. Example:

hadoop fs -rmr /user/hadoop/dir
hadoop fs -rmr hdfs://nn.example.com/user/hadoop/dir

Exit Code:

Returns 0 on success and -1 on error.


Alternatively

Data-nodes should be reformatted whenever the name-node is. I see 2 approaches here:

  1. In order to reformat the cluster we call "start-dfs -format" or make a special script "format-dfs". This would format the cluster components all together. The question is whether it should start the cluster after formatting?
  2. Format the name-node only. When data-nodes connect to the name-node it will tell them to format their storage directories if it sees that the namespace is empty and its cTime=0. The drawback of this approach is that we can loose blocks of a data-node from another cluster if it connects by mistake to the empty name-node.

https://issues.apache.org/jira/browse/HDFS-107

Upvotes: 3

vishnu viswanath
vishnu viswanath

Reputation: 3854

Formatting a Namenode won't format the Datanode.

It will just format the contents of your namenode. i.e., Your namenode will no longer know where your data is. Also namenode -format will assign a new namespace ID to the namenode

You will have to change your namespaceID in your datanode to make your datanode work. This will be at dfs/data/current/VERSION

There is a JIRA open now for the same suggesting to format Datanode aswell when you format Namenode. HDFS-107

Upvotes: 2

Related Questions