Daniel Watrous
Daniel Watrous

Reputation: 3861

Hadoop datanode binds wrong IP address

I have a three node hadoop cluster running. For some reason when the datanode slaves start up they identify with an IP address that doesn't even exist on my network. Here are my hostnames and IP mappings.

nodes:
  - hostname: hadoop-master
    ip: 192.168.51.4
  - hostname: hadoop-data1
    ip: 192.168.52.4
  - hostname: hadoop-data2
    ip: 192.168.52.6

As you can see below, the hadoop-master node starts up properly, but of the other two nodes, only one ever shows up as a Live datanode and whichever one shows up always has the IP 192.168.51.1, which as you can see above doesn't even exist on my network.

hadoop@hadoop-master:~$ hdfs dfsadmin -report
Safe mode is ON
Configured Capacity: 84482326528 (78.68 GB)
Present Capacity: 75735965696 (70.53 GB)
DFS Remaining: 75735281664 (70.53 GB)
DFS Used: 684032 (668 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (2):

Name: 192.168.51.1:50010 (192.168.51.1)
Hostname: hadoop-data2
Decommission Status : Normal
Configured Capacity: 42241163264 (39.34 GB)
DFS Used: 303104 (296 KB)
Non DFS Used: 4305530880 (4.01 GB)
DFS Remaining: 37935329280 (35.33 GB)
DFS Used%: 0.00%
DFS Remaining%: 89.81%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Sep 25 13:54:23 UTC 2015


Name: 192.168.51.4:50010 (hadoop-master)
Hostname: hadoop-master
Decommission Status : Normal
Configured Capacity: 42241163264 (39.34 GB)
DFS Used: 380928 (372 KB)
Non DFS Used: 4440829952 (4.14 GB)
DFS Remaining: 37799952384 (35.20 GB)
DFS Used%: 0.00%
DFS Remaining%: 89.49%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Sep 25 13:54:21 UTC 2015

I did attempt to add the dfs.datanode.address explicitly for each host, but in that case it failed to even show up as a live node. Here's what my hdfs-site.xml looks like (noting that I have tried it both with the dfs.datanode.address set and absent).

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>2</value>
    <description>Default block replication.
    The actual number of replications can be specified when the file is created.
    The default is used if replication is not specified in create time.
    </description>
  </property>
  <property>
    <name>dfs.namenode.rpc-bind-host</name>
    <value>0.0.0.0</value>
  </property>
  <property>
    <name>dfs.datanode.address</name>
    <value>192.168.51.4:50010</value>
  </property>
  <property>
    <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
    <value>false</value>
  </property>
  <property>
   <name>dfs.namenode.name.dir</name>
   <value>/home/hadoop/hadoop-data/hdfs/namenode</value>
   <description>Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.</description>
  </property>
  <property>
   <name>dfs.datanode.data.dir</name>
   <value>/home/hadoop/hadoop-data/hdfs/datanode</value>
   <description>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.</description>
  </property>
</configuration>

Why is hadoop associating every datanode with an IP that doesn't even exist? Or more importantly, how can I get the nodes to behave properly?

UPDATE: The file /etc/hosts on all nodes is identical

192.168.51.4 hadoop-master
192.168.52.4 hadoop-data1
192.168.52.6 hadoop-data2

Below is the contents of my slaves file.

hadoop@hadoop-master:~$ cat /usr/local/hadoop/etc/hadoop/slaves
hadoop-master
hadoop-data1
hadoop-data2

datanode logs:
https://gist.github.com/dwatrous/7241bb804a9be8f9303f https://gist.github.com/dwatrous/bcd85cda23d6eca3a68b https://gist.github.com/dwatrous/922c4f773aded0137fa3

namenode logs:
https://gist.github.com/dwatrous/dafaa7695698f36a5d93

Upvotes: 1

Views: 3589

Answers (2)

Daniel Watrous
Daniel Watrous

Reputation: 3861

After reviewing all possible issues, this one appears to be related to some combination of Vagrant and Virtualbox. I was trying to run the master node on one subnet and the datanodes on another subnet. It turns out that the way the networking was configured, I could communicate between these subnets, but there was some type of a hidden gateway that caused the wrong IP address to be used.

The solution was to change my Vagrantfile to put all three hosts on the same subnet. After that everything worked as expected.

Upvotes: 2

Sumit Chawla
Sumit Chawla

Reputation: 399

can you post your entire datanode logs? Give a try to setting following value to the interface name of the ip you want to bind to.

dfs.client.local.interfaces = eth0

Upvotes: 0

Related Questions