Reputation: 21393
I am trying to learn Hadoop by following a tutorial and trying to do pseudo-distributed mode on my machine.
My core-site.xml
is:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation.
</description>
</property>
</configuration>
My hdfs-site.xml
file is:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>The actual number of replications can be specified when the
file is created.
</description>
</property>
</configuration>
My mapred-site.xml
file is:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
<description>The host and port that the MapReduce job tracker runs
at.
</description>
</property>
</configuration>
When I run the command it ran successfully but what it is doing actually:
hadoop-1.2.1$ bin/hadoop namenode -format
14/11/26 12:37:16 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = myhost/127.0.0.8
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.6.0_45
************************************************************/
14/11/26 12:37:17 INFO util.GSet: Computing capacity for map BlocksMap
14/11/26 12:37:17 INFO util.GSet: VM type = 64-bit
14/11/26 12:37:17 INFO util.GSet: 2.0% max memory = 932118528
14/11/26 12:37:17 INFO util.GSet: capacity = 2^21 = 2097152 entries
14/11/26 12:37:17 INFO util.GSet: recommended=2097152, actual=2097152
14/11/26 12:37:17 INFO namenode.FSNamesystem: fsOwner=myuser
14/11/26 12:37:17 INFO namenode.FSNamesystem: supergroup=supergroup
14/11/26 12:37:17 INFO namenode.FSNamesystem: isPermissionEnabled=true
14/11/26 12:37:17 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
14/11/26 12:37:17 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
14/11/26 12:37:17 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
14/11/26 12:37:17 INFO namenode.NameNode: Caching file names occuring more than 10 times
14/11/26 12:37:17 INFO common.Storage: Image file /tmp/hadoop-myuser/dfs/name/current/fsimage of size 115 bytes saved in 0 seconds.
14/11/26 12:37:18 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/tmp/hadoop-myuser/dfs/name/current/edits
14/11/26 12:37:18 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/tmp/hadoop-myuser/dfs/name/current/edits
14/11/26 12:37:18 INFO common.Storage: Storage directory /tmp/hadoop-myuser/dfs/name has been successfully formatted.
14/11/26 12:37:18 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at chaitanya-OptiPlex-3010/127.0.0.8
************************************************************/
Can someone please let me know what it is doing internally.
I have gone through these posts but there is no correct explanation.
What exactly is hadoop namenode formatting?
hadoop namenode is not formatting
How can I check this practically on my machine so I can see the differences before and after running the command. I am new to Hadoop so this can be a trivial question.
Upvotes: 27
Views: 73211
Reputation: 3990
Actually formatting a Namenode will not format the Datanode.
It will just format the contents of your namenode (which contains details of datanode). Your namenode will no longer know where your data is. Also namenode -format will assign a new namespace ID to the namenode
You have to change your namespaceID in your datanode to make your datanode work. This will be at dfs/data/current/VERSION
There is a JIRA open now for the same suggesting to format Datanode as well when you format Namenode. HDFS-107
Upvotes: 4
Reputation: 1043
Hadoop namenode -format
Hadoop namenode directory contains the fsimage and edit files which holds the basic information's about hadoop file system such as where is data available, which user created files like that
If you format the namenode then the above information's are deleted
from namenode directory which is specified in the hdfs-site.xml as dfs.namenode.name.dir
But you still have the datas on the hadoop but not namenode meta data
Upvotes: 11
Reputation: 530
Steps
start all the services using "start-all.sh"
check the services are running or not using "JPS"
note: if you use hadoop2.3.0 then following services are need to run
Namenode
Datanode
Resourcemanager
Nodemanager
Move some file from local to HDFS using hdfs -put /
Now check at location "/tmp/hadoop-myuser/dfs/name" you may find this file split into some BLOCKS conatain 64 MB each.
Then start Formatting using **hadoop namenode -format**
Now the file is not available phisically on that location
Further information click here
Upvotes: 0
Reputation: 1034
hadoop namenode -format
this command deletes all files in your hdfs.
tmp directory contains two folders datanode, namenode in local filesystem. if you format the namenode these two folders becomes empty.
Note : if you want to format your namenode first stop all hadoop services then delete the tmp(contains namenode and datanode) folder in your local file system and start hadoop service surely it will take effect.
Reason for Hadoop namenode -format :
Hadoop NameNode is the centralized place of an HDFS file system which keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. In short, it keeps the metadata related to datanodes. When we format namenode it formats the meta-data related to data-nodes. By doing that, all the information on the datanodes are lost and they can be reused for new data.
By default the namenode location will be at "/tmp/hadoop-myuser/dfs/name"
While you formatting the namenode, this file location was cleared.
To change the namenode location add the follwing properties At hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/search/data/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/search/data/dfs/datanode</value>
</property>
I hope this will help you.. :-)
Upvotes: 20
Reputation: 1809
Namenode contains metadata about the Hadoop filesystem.
This command (hadoop-1.2.1$ bin/hadoop namenode -format) will format whole Hadoop distributed file system(HDFS). So if you run this command on existing filesystem you will lose all your data.
Upvotes: 3