Learner
Learner

Reputation: 21393

What the command "hadoop namenode -format" will do

I am trying to learn Hadoop by following a tutorial and trying to do pseudo-distributed mode on my machine.

My core-site.xml is:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
   <property>
      <name>fs.default.name</name>
      <value>hdfs://localhost:9000</value>
      <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation.       
      </description>   
   </property>
</configuration>

My hdfs-site.xml file is:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
   <property>
      <name>dfs.replication</name>
      <value>1</value>
      <description>The actual number of replications can be specified when the
        file is created.
      </description>
   </property>
</configuration>

My mapred-site.xml file is:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
   <property>      
      <name>mapred.job.tracker</name>
      <value>localhost:9001</value>
      <description>The host and port that the MapReduce job tracker runs
        at.
      </description>
   </property>
</configuration>

When I run the command it ran successfully but what it is doing actually:

hadoop-1.2.1$ bin/hadoop namenode -format
14/11/26 12:37:16 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = myhost/127.0.0.8
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.2.1
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG:   java = 1.6.0_45
************************************************************/
14/11/26 12:37:17 INFO util.GSet: Computing capacity for map BlocksMap
14/11/26 12:37:17 INFO util.GSet: VM type       = 64-bit
14/11/26 12:37:17 INFO util.GSet: 2.0% max memory = 932118528
14/11/26 12:37:17 INFO util.GSet: capacity      = 2^21 = 2097152 entries
14/11/26 12:37:17 INFO util.GSet: recommended=2097152, actual=2097152
14/11/26 12:37:17 INFO namenode.FSNamesystem: fsOwner=myuser
14/11/26 12:37:17 INFO namenode.FSNamesystem: supergroup=supergroup
14/11/26 12:37:17 INFO namenode.FSNamesystem: isPermissionEnabled=true
14/11/26 12:37:17 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
14/11/26 12:37:17 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
14/11/26 12:37:17 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
14/11/26 12:37:17 INFO namenode.NameNode: Caching file names occuring more than 10 times 
14/11/26 12:37:17 INFO common.Storage: Image file /tmp/hadoop-myuser/dfs/name/current/fsimage of size 115 bytes saved in 0 seconds.
14/11/26 12:37:18 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/tmp/hadoop-myuser/dfs/name/current/edits
14/11/26 12:37:18 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/tmp/hadoop-myuser/dfs/name/current/edits
14/11/26 12:37:18 INFO common.Storage: Storage directory /tmp/hadoop-myuser/dfs/name has been successfully formatted.
14/11/26 12:37:18 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at chaitanya-OptiPlex-3010/127.0.0.8
************************************************************/

Can someone please let me know what it is doing internally.

I have gone through these posts but there is no correct explanation.

What exactly is hadoop namenode formatting?

hadoop namenode is not formatting

How can I check this practically on my machine so I can see the differences before and after running the command. I am new to Hadoop so this can be a trivial question.

Upvotes: 27

Views: 73211

Answers (5)

Kumar
Kumar

Reputation: 3990

Actually formatting a Namenode will not format the Datanode.

It will just format the contents of your namenode (which contains details of datanode). Your namenode will no longer know where your data is. Also namenode -format will assign a new namespace ID to the namenode

You have to change your namespaceID in your datanode to make your datanode work. This will be at dfs/data/current/VERSION

There is a JIRA open now for the same suggesting to format Datanode as well when you format Namenode. HDFS-107

Upvotes: 4

Rengasamy
Rengasamy

Reputation: 1043

Hadoop namenode -format

  • Hadoop namenode directory contains the fsimage and edit files which holds the basic information's about hadoop file system such as where is data available, which user created files like that

  • If you format the namenode then the above information's are deleted from namenode directory which is specified in the hdfs-site.xml as dfs.namenode.name.dir

  • But you still have the datas on the hadoop but not namenode meta data

Upvotes: 11

Steps start all the services using "start-all.sh"

check the services are running or not using "JPS" note: if you use hadoop2.3.0 then following services are need to run

Namenode
Datanode
Resourcemanager
Nodemanager

Move some file from local to HDFS using hdfs -put /

Now check at location "/tmp/hadoop-myuser/dfs/name" you may find this file split into some BLOCKS conatain 64 MB each.

Then start Formatting using **hadoop namenode -format** Now the file is not available phisically on that location

Further information click here

Upvotes: 0

Suresh Ram
Suresh Ram

Reputation: 1034

hadoop namenode -format this command deletes all files in your hdfs.

tmp directory contains two folders datanode, namenode in local filesystem. if you format the namenode these two folders becomes empty.

Note : if you want to format your namenode first stop all hadoop services then delete the tmp(contains namenode and datanode) folder in your local file system and start hadoop service surely it will take effect.

Reason for Hadoop namenode -format :

Hadoop NameNode is the centralized place of an HDFS file system which keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. In short, it keeps the metadata related to datanodes. When we format namenode it formats the meta-data related to data-nodes. By doing that, all the information on the datanodes are lost and they can be reused for new data.

By default the namenode location will be at "/tmp/hadoop-myuser/dfs/name"

While you formatting the namenode, this file location was cleared.

To change the namenode location add the follwing properties At hdfs-site.xml

<property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/search/data/dfs/namenode</value>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/search/data/dfs/datanode</value>
</property>

I hope this will help you.. :-)

Upvotes: 20

Abhijeet Dhumal
Abhijeet Dhumal

Reputation: 1809

Namenode contains metadata about the Hadoop filesystem.

This command (hadoop-1.2.1$ bin/hadoop namenode -format) will format whole Hadoop distributed file system(HDFS). So if you run this command on existing filesystem you will lose all your data.

Upvotes: 3

Related Questions