mjbsgll
mjbsgll

Reputation: 742

Setting hadoop.tmp.dir in /tmp

I saw in Hadoop documentation 1 the default value for hadoop.tmp.dir is /tmp/hadoop-${user.name} but if I set it in that way, when machine restarts, do I loss the data?

I mean, maybe I do not have to set this in the real /tmp, but I need to do it in /home/myuser/tmp/hadoop-${user.name} ?

Thank you in advance!

Adding information:

I set it in /tmp/hadoop-hduser, but the computer was interrupted due to electrical power problems and today I got this message: Call From java.net.UnknownHostException: hduser-machine: hduser-machine to localhost:54310 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused and I ran this command hadoop namenode and I got this: ERROR namenode.NameNode: Failed to start namenode. org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-hduser/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible. And I thought that it was due to the /tmp configuration...

Upvotes: 1

Views: 2735

Answers (1)

gudok
gudok

Reputation: 4179

By default, Hadoop is configured to run out of the box. To achieve this, all important (non-temporary) dirs point inside ${hadoop.tmp.dir}, which in turn points to /tmp, which in turn is present on all Linux systems.

As such, you also need to adjust other important paths, see hdfs-default.xml:

dfs.namenode.name.dir
dfs.datanode.data.dir
dfs.namenode.checkpoint.dir

They are separate options because in real-world environment it may be feasible to distribute temporary and non-temporary data across different physical storage devices. But if setup is small, then technically yes, you may point ${hadoop.tmp.dir} into some persistent (non-/tmp) place and ignore what I wrote above.

Upvotes: 2

Related Questions