2D_
2D_

Reputation: 601

Hadoop, Socket Timeout Error

I am trying to run terasort on Hadoop. I get a timeout execption error as below.

[hadoop@master mapreduce]$ hadoop jar $(ls hadoop-mapreduce-examples-2*.jar) teragen 100000000 /terasort/in
16/10/08 21:30:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/10/08 21:30:17 INFO client.RMProxy: Connecting to ResourceManager at master/10.90.110.160:8032
16/10/08 21:30:33 INFO terasort.TeraSort: Generating 100000000 using 2
16/10/08 21:30:33 INFO mapreduce.JobSubmitter: number of splits:2
16/10/08 21:30:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1475979237007_0002
16/10/08 21:30:34 INFO impl.YarnClientImpl: Submitted application application_1475979237007_0002
16/10/08 21:30:34 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1475979237007_0002/
16/10/08 21:30:34 INFO mapreduce.Job: Running job: job_1475979237007_0002
16/10/08 21:38:25 INFO mapreduce.Job: Job job_1475979237007_0002 running in uber mode : false
16/10/08 21:38:25 INFO mapreduce.Job:  map 0% reduce 0%
16/10/08 21:38:25 INFO mapreduce.Job: Job job_1475979237007_0002 failed with state FAILED due to: Application application_1475979237007_0002 failed 2 times due to Error launching appattempt_1475979237007_0002_000002. Got exception: org.apache.hadoop.net.ConnectTimeoutException: Call From master.someplace.net/69.172.201.153 to 69.172.201.153:35751 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=69.172.201.153/69.172.201.153:35751]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:751)
    at org.apache.hadoop.ipc.Client.call(Client.java:1480)
    at org.apache.hadoop.ipc.Client.call(Client.java:1407)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy32.startContainers(Unknown Source)
    at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
    at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
    at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=69.172.201.153/69.172.201.153:35751]
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
    at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1529)
    at org.apache.hadoop.ipc.Client.call(Client.java:1446)
    ... 9 more
. Failing the application.
16/10/08 21:38:25 INFO mapreduce.Job: Counters: 0

I have checked my three nodes, and they are working fine.

Live datanodes (3):

Name: 10.90.110.160:50010 (master.hadoop.mids.lulz.bz)
Hostname: 69.172.201.153
Decommission Status : Normal
Configured Capacity: 105554829312 (98.31 GB)
DFS Used: 831488 (812 KB)
Non DFS Used: 5449568256 (5.08 GB)
DFS Remaining: 100104429568 (93.23 GB)
DFS Used%: 0.00%
DFS Remaining%: 94.84%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sat Oct 08 21:47:42 CDT 2016


Name: 10.90.110.169:50010 (slave2.hadoop.mids.lulz.bz)
Hostname: 69.172.201.153
Decommission Status : Normal
Configured Capacity: 105554829312 (98.31 GB)
DFS Used: 831488 (812 KB)
Non DFS Used: 5448441856 (5.07 GB)
DFS Remaining: 100105555968 (93.23 GB)
DFS Used%: 0.00%
DFS Remaining%: 94.84%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sat Oct 08 21:47:42 CDT 2016


Name: 10.90.110.165:50010 (slave1.hadoop.mids.lulz.bz)
Hostname: 69.172.201.153
Decommission Status : Normal
Configured Capacity: 105554829312 (98.31 GB)
DFS Used: 831488 (812 KB)
Non DFS Used: 5448441856 (5.07 GB)
DFS Remaining: 100105555968 (93.23 GB)
DFS Used%: 0.00%
DFS Remaining%: 94.84%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sat Oct 08 21:47:42 CDT 2016

Please help me where else to look for a solution. I am completely lost here... Thanks in advance!

Upvotes: 0

Views: 4644

Answers (1)

Kris
Kris

Reputation: 1734

I think, the system is using default timeout period while DFSClient communicating to the data node. The following configuration might be helpful to increase the dfs.datanode.socket.write.timeout and dfs.socket.timeout.

Change or Add the below configuration to increase the timeout,

<property>
  <name>dfs.datanode.socket.write.timeout</name>
  <value>2000000</value>
</property>

<property>
  <name>dfs.socket.timeout</name>
  <value>2000000</value>
</property>

Further, the system is trying to connect 69.172.201.153 in the logs. Is this correct IP ?.

Upvotes: 3

Related Questions