Should hadoop installation path be the same across nodes

Question

Hadoop 2.7 is installed at /opt/pro/hadoop/hadoop-2.7.3 at master, then the whole installation is copied to slave, but different directory /opt/pro/hadoop-2.7.3. I then update the environment variables (e.g., HADOOP_HOME, hdfs_site.xml for namenode and datanode) at slave machine.

Now I can run hadoop version at slave successfully. However, in the master, start-dfs.sh fails with message:

17/02/18 10:24:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [master]
master: starting namenode, logging to /opt/pro/hadoop/hadoop-2.7.3/logs/hadoop-shijiex-namenode-shijie-ThinkPad-T410.out
master: starting datanode, logging to /opt/pro/hadoop/hadoop-2.7.3/logs/hadoop-shijiex-datanode-shijie-ThinkPad-T410.out
slave: bash: line 0: cd: /opt/pro/hadoop/hadoop-2.7.3: No such file or directory
slave: bash: /opt/pro/hadoop/hadoop-2.7.3/sbin/hadoop-daemon.sh: No such file or directory
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /opt/pro/hadoop/hadoop-2.7.3/logs/hadoop-shijiex-secondarynamenode-shijie-ThinkPad-T410.out
17/02/18 10:26:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

The hadoop uses the HADOOP_HOME of master(/opt/pro/hadoop/hadoop-2.7.3) at slave, while the HADOOP_HOME at slave is /opt/pro/hadoop-2.7.3. So should the HADOOP_HOME be the same across nodes when installation?

.bashrc

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export PATH=$PATH:/usr/lib/jvm/java-7-openjdk-amd64/bin

export HADOOP_HOME=/opt/pro/hadoop-2.7.3
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin

hadoop-env.sh

# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

At slave server, $HADOOP_HOME/etc/hadoop has a file masters:

xx@wodaxia:/opt/pro/hadoop-2.7.3/etc/hadoop$ cat masters 
master

franklinsijo · Accepted Answer

No, Not necessarily. But if the paths are different among the nodes, then you cannot use the scripts like start-dfs.sh, stop-dfs.sh and the same for yarn. These scripts refer the $HADOOP_PREFIX variable of the node where the script is executed.

Snippet of code from hadoop-daemons.sh used by start-dfs.sh to start all the datanodes.

exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_PREFIX" \; "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$@"

The script is written this way because of the assumption that all the nodes of cluster follow the same $HADOOP_PREFIX or $HADOOP_HOME (deprecated) path.

To overcome this,

1) Either try to have the path same across all the nodes.

2) Or login to each node in the cluster and start the dfs process applicable for that node using,

$HADOOP_HOME/sbin/hadoop-daemon.sh start

Same procedure for yarn as well,

$HADOOP_HOME/sbin/yarn-daemon.sh start

Should hadoop installation path be the same across nodes

Answers (2)

Related Questions