Reputation: 133
I want to upload a file from an external Windows server to Hdfs in a different server. Hdfs is part of cloudera docker container in that server.
I connected to the Hdfs from Windows server as below:
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://%HDFS_SERVER_IP%:8020");
fs = FileSystem.get(conf);
When I call fs.copyFromLocalFile(localFilePath, hdfsFilePath);
, it throws below exceptions and it creates the file without any content in Hdfs. :
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/test/test.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1595)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3287)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:677)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:213)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:485)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
at org.apache.hadoop.ipc.Client.call(Client.java:1475)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1455)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1251)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)
And there seems a problem in datanode, the below is copied from its log:
Retrying connect to server: 0.0.0.0/0.0.0.0:8022. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
I formatted the datanodes and restart hdfs but still can't upload the file in this case. Besides other functions such as reading, writing file are working with the configuration and the file can be transferred if the local system and Hdfs are in the same server.
The servers are connected to the proxy server and I configured proxy environment of the docker container of Hdfs. How is the file transferred by using Hdfs Java Api between different servers?
Update 1:
hdfs dfsadmin -report :
17/04/05 07:14:02 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032
Total Nodes:1
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
quickstart.cloudera:37449 RUNNING quickstart.cloudera:8042 0
[root@quickstart conf]# hdfs dfsadmin -report
Configured Capacity: 211243687936 (196.74 GB)
Present Capacity: 78773199014 (73.36 GB)
DFS Remaining: 77924307110 (72.57 GB)
DFS Used: 848891904 (809.57 MB)
DFS Used%: 1.08%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (1):
Name: XXXX:50010 (quickstart.cloudera)
Hostname: quickstart.cloudera
Decommission Status : Normal
Configured Capacity: 211243687936 (196.74 GB)
DFS Used: 848891904 (809.57 MB)
Non DFS Used: 132470488922 (123.37 GB)
DFS Remaining: 77924307110 (72.57 GB)
DFS Used%: 0.40%
DFS Remaining%: 36.89%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 6
Last contact: Wed Apr 05 07:15:00 UTC 2017
yarn node -list -all :
17/04/05 07:14:02 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032
Total Nodes:1
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
quickstart.cloudera:37449 RUNNING quickstart.cloudera:8042 0
core-site.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://quickstart.cloudera:8020</value>
</property>
<!-- OOZIE proxy user setting -->
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
<!-- HTTPFS proxy user setting -->
<property>
<name>hadoop.proxyuser.httpfs.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.httpfs.groups</name>
<value>*</value>
</property>
<!-- Llama proxy user setting -->
<property>
<name>hadoop.proxyuser.llama.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.llama.groups</name>
<value>*</value>
</property>
<!-- Hue proxy user setting -->
<property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value>
</property>
</configuration>
hdfs-site.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!-- Immediately exit safemode as soon as one DataNode checks in.
On a multi-node cluster, these configurations must be removed. -->
<property>
<name>dfs.safemode.extension</name>
<value>0</value>
</property>
<property>
<name>dfs.safemode.min.datanodes</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.safemode.min.datanodes</name>
<value>1</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/lib/hadoop-hdfs/cache/${user.name}</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/var/lib/hadoop-hdfs/cache/${user.name}/dfs/name</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>/var/lib/hadoop-hdfs/cache/${user.name}/dfs/namesecondary</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/var/lib/hadoop-hdfs/cache/${user.name}/dfs/data</value>
</property>
<property>
<name>dfs.namenode.rpc-bind-host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address</name>
<value>0.0.0.0:8022</value>
</property>
<property>
<name>dfs.https.address</name>
<value>0.0.0.0:50470</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>0.0.0.0:50070</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:50010</value>
</property>
<property>
<name>dfs.datanode.ipc.address</name>
<value>0.0.0.0:50020</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:50075</value>
</property>
<property>
<name>dfs.datanode.https.address</name>
<value>0.0.0.0:50475</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>0.0.0.0:50090</value>
</property>
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>0.0.0.0:50495</value>
</property>
<!-- Impala configuration -->
<property>
<name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.client.file-block-storage-locations.timeout.millis</name>
<value>10000</value>
</property>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/run/hadoop-hdfs/dn._PORT</value>
</property>
</configuration>
Upvotes: 2
Views: 1631
Reputation: 133
I only changed conf.set("fs.defaultFS", "hdfs://%HDFS_SERVER_IP%:8020")
to conf.set("fs.defaultFS", "webhdfs://%HDFS_SERVER_IP%:50070")
and then I succesfully upload the files to the hdfs in the different server. I referred to this link.
Upvotes: 0
Reputation: 18270
The RPC
ports are in conflict between the properties fs.defaultFS
in core-site.xml
and dfs.namenode.servicerpc-address
in hdfs-site.xml
.
Modify this in hdfs-site.xml
and restart the services.
<property>
<name>dfs.namenode.servicerpc-address</name>
<value>0.0.0.0:8020</value>
</property>
Upvotes: 1