Reputation: 742
I saw in Hadoop documentation 1 the default value for hadoop.tmp.dir
is /tmp/hadoop-${user.name}
but if I set it in that way, when machine restarts, do I loss the data?
I mean, maybe I do not have to set this in the real /tmp
, but I need to do it in /home/myuser/tmp/hadoop-${user.name}
?
Thank you in advance!
Adding information:
I set it in /tmp/hadoop-hduser, but the computer was interrupted due to electrical power problems and today I got this message: Call From java.net.UnknownHostException: hduser-machine: hduser-machine to localhost:54310 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
and I ran this command hadoop namenode
and I got this: ERROR namenode.NameNode: Failed to start namenode.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-hduser/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
And I thought that it was due to the /tmp configuration...
Upvotes: 1
Views: 2735
Reputation: 4179
By default, Hadoop is configured to run out of the box. To achieve this, all important (non-temporary) dirs point inside ${hadoop.tmp.dir}
, which in turn points to /tmp
, which in turn is present on all Linux systems.
As such, you also need to adjust other important paths, see hdfs-default.xml:
dfs.namenode.name.dir
dfs.datanode.data.dir
dfs.namenode.checkpoint.dir
They are separate options because in real-world environment it may be feasible to distribute temporary and non-temporary data across different physical storage devices. But if setup is small, then technically yes, you may point ${hadoop.tmp.dir}
into some persistent (non-/tmp) place and ignore what I wrote above.
Upvotes: 2