Reputation: 1367
Because of many error I can't figure it out why it's happening in not connecting datanode slave vm into my master vm. Any suggestion is welcome, so i can try it. And to start, one of them is this error in my slave vm log:
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8:9000
Because of this, I can't run the job that I want in my master vm:
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5
which give me this error
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/ubuntu/QuasiMonteCarlo_1386793331690_1605707775/in/part0 could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
and even so, the hdfs dfsadmin -report
(at master vm) gives me all 0
Configured Capacity: 0 (0 B)
Present Capacity: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used: 0 (0 B)
DFS Used%: NaN%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Datanodes available: 0 (0 total, 0 dead)
For that, I build up on openstack 3 vms ubuntu, one for master and others slaves.
in master, it's build up in etc/hosts
127.0.0.1 localhost
50.50.1.9 ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8
50.50.1.8 slave1
50.50.1.4 slave2
core-site.xml
<name>fs.default.name</name>
<value>hdfs://ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8:9000</value>
<name>hadoop.tmp.dir</name>
<value>/home/ubuntu/hadoop-2.2.0/tmp</value>
hdfs-site.xml
<name>dfs.replication</name>
<value>3</value>
<name>dfs.namenode.name.dir</name>
<value>file:/home/ubuntu/hadoop-2.2.0/etc/hdfs/namenode</value>
<name>dfs.datanode.data.dir</name>
<value>file:/home/ubuntu/hadoop-2.2.0/etc/hdfs/datanode</value>
<name>dfs.permissions</name>
<value>false</value>
mapred-site.xml
<name>mapreduce.framework.name</name>
<value>yarn</value>
And my slave vm file contains each line: slave1 and slave2.
All the logs from master vm contains no error, but when I use slave vm, it gives that error to connect. and the nodemanager gives me error too inside the log:
Error starting NodeManager org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From ubuntu-e6df65dc-bf95-45ca-bad5-f8ddcc272b76/50.50.1.8 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused;
From my Slave Machine: core-site.xml
<name>fs.default.name</name>
<value>hdfs://ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8:9000</value>
<name>hadoop.tmp.dir</name>
<value>/home/ubuntu/hadoop-2.2.0/tmp</value>
hdfs-site.xml
<name>dfs.namenode.name.dir</name>
<value>file:/home/ubuntu/hadoop-2.2.0/etc/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/ubuntu/hadoop-2.2.0/etc/hdfs/datanode</value>
and on my /etc/hosts
127.0.0.1 localhost
50.50.1.8 ubuntu-e6df65dc-bf95-45ca-bad5-f8ddcc272b76
50.50.1.9 ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8
The JPS master
15863 ResourceManager
15205 SecondaryNameNode
14967 NameNode
16194 Jps
slave
1988 Jps
1365 DataNode
1894 NodeManager
Upvotes: 7
Views: 16459
Reputation: 11
In my case, I used hdfs datanode -format to format datanode server, hdfs namenode -format to format datanode server. before that, make sure delete all the files in the data folder which are included in hdfs-site file.
Upvotes: 1
Reputation: 21
I struggled a lot, finally got after using "systemctl stop firewalld" before this I also disabled selinux and ipv6.
Upvotes: 2
Reputation: 1367
The cause all of the error showing, this below error is the main reason not been able to master connect to slave:
Error starting NodeManager org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From ubuntu-e6df65dc-bf95-45ca-bad5-f8ddcc272b76/50.50.1.8 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused;
Basically, 0.0.0.0:8031
is the port of yarn.resourcemanager.resource-tracker.address
, so I checked using lsof -i :8031, the port wasn't enable/open/allowed. Since I'm using OpenStack(cloud), added 8031 and other ports that was showing error and voilá, worked as intend.
Upvotes: 4