Edward67
Edward67

Reputation: 35

Data lost after shutting down hadoop HDFS?

Hi I'm learning hadoop and I have a simple dumb question: After I shut down HDFS(by calling hadoop_home/sbin/stop-dfs.sh), is the data on HDFS lost or can I get it back?

Upvotes: 3

Views: 3754

Answers (2)

Ashrith
Ashrith

Reputation: 6855

Data wouldn't be lost if you stop HDFS, provided you store the data of NameNode and DataNode's in a persistent locations specified using the properties:

  • dfs.namenode.name.dir -> Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. Default value: file://${hadoop.tmp.dir}/dfs/name
  • dfs.datanode.data.dir -> Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored. Default value: file://${hadoop.tmp.dir}/dfs/data

As you could see, the default values for both properties point to ${hadoop.tmp.dir} which by default is /tmp. You might already know that the data in /tmp in Unix based systems get's cleared on reboot's.

So, if you would specify dir location's other than /tmp then Hadoop HDFS daemons on reboot would be able to read back the data and hence no data loss even on cluster restart's.

Upvotes: 11

Bector
Bector

Reputation: 1334

Please make sure you are not deleting metadata of your data stored in HDFS and this you can achieve simply if you are keeping dfs.namenode.name.dir and dfs.datanode.data.dir untouced, means not deleting path present in these tags which present in your hdfs-site.xml file.

Upvotes: 0

Related Questions