Reputation: 21
I'm not experienced with HDFS and I've run into a problem related to HDFS running on my macbook. I have a HDFS client which is launched in a docker container, and every time I try to put or get data to/from HDFS from this container I get the following error:
hdfs dfs -put /core-site.xml hdfs://host.docker.internal:9000/abcs
21/03/02 07:28:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/03/02 07:28:48 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1610)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
21/03/02 07:28:48 INFO hdfs.DFSClient: Abandoning BP-1485605719-127.0.0.1-1614607405999:blk_1073741832_1008
21/03/02 07:28:48 INFO hdfs.DFSClient: Excluding datanode 127.0.0.1:9866
21/03/02 07:28:48 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /abcs/core-site.xml._COPYING_ could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
It can be clearly seen that my client(container) receives the wrong IP address of the DataNode (127.0.0.1:9866), it should be something like 192.168.65.2:9866 i.e. host.docker.internal. or domain name of my laptop (ex. my-laptop)
My core-site.xml: (of course my-laptop is binded to 127.0.0.1 in etc/hosts)
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://my-laptop:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/Users/Ian_Rakhmatullin/localHadoopTmp</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>my-laptop:9866</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>my-laptop:9864</value>
</property>
<property>
<name>dfs.datanode.ipc.address</name>
<value>my-laptop:9867</value>
</property>
</configuration>
One more thing that confuses me is that through HDFS webUI I can see that DataNode is running on localhost:9866 (127.0.0.1:9866)
, but I expect "my-laptop:9866" as well.
Does anyone have any thoughts how to resolve this issue? Thank you.
Upvotes: 1
Views: 613
Reputation: 21
Seems like I've solved this problem, by following these steps:
hdfs-site xml:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.hostname</name>
<value>my-laptop</value>
</property>
core-site xml the same as it is in my question.
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
192.168.65.2 my-laptop
With this approach Namenode will return host name for your Datanode to the hdfs client, and then, the client will use your mapping to host.docker.internal. And this is what I needed.
Upvotes: 1