Reputation: 21
I am using Hadoop 3.2.2 in a cluster based on Windows 10 and on which the high availability is configured on HDFS using the Quorum Journal manager.
The system works just fine, I am able to transition nodes from active to standby state without issues, but I often get the following error message :
java.io.IOException: Exception during image upload
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:315)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1300(StandbyCheckpointer.java:64)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.doWork(StandbyCheckpointer.java:480)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.access$600(StandbyCheckpointer.java:383)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread$1.run(StandbyCheckpointer.java:403)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:502)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.run(StandbyCheckpointer.java:399)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Error writing request body to server
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:295)
... 6 more
Caused by: java.io.IOException: Error writing request body to server
at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3597)
at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3580)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:377)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:321)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:295)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:230)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748
My cluster setup is the following
A: Namenode, Zookeeper, ZKFC, Journal
B: Namenode, Zookeeper, ZKFC, Journal
C: Namenode, Zookeeper, ZKFC
D: Journal, Datanode
E,F,G....: Datanode
Here is my hdfs-site configuration
<configuration>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
<description>Logical name for this new nameservice</description>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>A,B,C</value>
<description>Unique identifiers for each NameNode in the
nameservice</description>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.A</name>
<value>A:8020</value>
<description>RPC address for NameNode 1, it is necessary to use the real host name of the machine instead of an aliases</description>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.B</name>
<value>B:8020</value>
<description>RPC address for NameNode 2</description>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.C</name>
<value>C:8020</value>
<description>RPC address for NameNode 3</description>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.A</name>
<value>A:9870</value>
<description>HTTP address for NameNode 1</description>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.B</name>
<value>B:9870</value>
<description>HTTP address for NameNode 2</description>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.C</name>
<value>C:9870</value>
<description>HTTP address for NameNode 3</description>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://A:8485;B:8485;D:8485/mycluster</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(C:/mylocation/stop-namenode.bat $target_host)</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>C:/hadoop-3.2.2/data/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>A:2181,B:2181,C:2181</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///C:/hadoop-3.2.2/data/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///C:/hadoop-3.2.2/data/dfs/datanode</value>
</property>
<property>
<name>dfs.namenode.safemode.threshold-pct</name>
<value>0.5f</value>
</property>
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>true</value>
</property>
</configuration>
Does someone got the same issue ? Am I missing something here ?
Upvotes: 1
Views: 603
Reputation: 1
Not sure if this issue is resolved. It may be because of this change https://issues.apache.org/jira/browse/HADOOP-16886. Solution would be to add the desired value for hadoop.http.idle_timeout.ms in core-site.xml.
Upvotes: 0