Reputation: 35
Hi I'm learning hadoop and I have a simple dumb question: After I shut down HDFS(by calling hadoop_home/sbin/stop-dfs.sh), is the data on HDFS lost or can I get it back?
Upvotes: 3
Views: 3754
Reputation: 6855
Data wouldn't be lost if you stop HDFS, provided you store the data of NameNode and DataNode's in a persistent locations specified using the properties:
dfs.namenode.name.dir
-> Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. Default value: file://${hadoop.tmp.dir}/dfs/name
dfs.datanode.data.dir
-> Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored. Default value: file://${hadoop.tmp.dir}/dfs/data
As you could see, the default values for both properties point to ${hadoop.tmp.dir}
which by default is /tmp
. You might already know that the data in /tmp
in Unix based systems get's cleared on reboot's.
So, if you would specify dir location's other than /tmp
then Hadoop HDFS daemons on reboot would be able to read back the data and hence no data loss even on cluster restart's.
Upvotes: 11
Reputation: 1334
Please make sure you are not deleting metadata of your data stored in HDFS and this you can achieve simply if you are keeping dfs.namenode.name.dir
and dfs.datanode.data.dir
untouced, means not deleting path present in these tags which present in your hdfs-site.xml
file.
Upvotes: 0