UnknownHostException on tasktracker in Hadoop cluster

Question

I have set up a pseudo-distributed Hadoop cluster (with jobtracker, a tasktracker, and namenode all on the same box) per tutorial instructions and it's working fine. I am now trying to add in a second node to this cluster as another tasktracker.

When I examine the logs on Node 2, all the logs look fine except for the tasktracker. I'm getting an infinite loop of the error message listed below. It seems that the Task Tracker is trying to use the hostname SSP-SANDBOX-1.mysite.com rather than the ip address. This hostname is not in /etc/hosts so I'm guessing this is where the problem is coming from. I do not have root access in order to add this to /etc/hosts.

Is there any property or configuration I can change so that it will stop trying to connect using the hostname?

Thanks very much,

2011-01-18 17:43:22,896 ERROR org.apache.hadoop.mapred.TaskTracker: 
Caught exception: java.net.UnknownHostException: unknown host: SSP-SANDBOX-1.mysite.com
        at org.apache.hadoop.ipc.Client$Connection.(Client.java:195)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:850)
        at org.apache.hadoop.ipc.Client.call(Client.java:720)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
        at $Proxy5.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
        at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
        at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:207)
        at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:170)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
        at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1033)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1720)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2833)

bajafresh4life · Accepted Answer

This blog posting might be helpful:

http://western-skies.blogspot.com/2010/11/fix-for-exceeded-maxfaileduniquefetches.html

The short answer is that Hadoop performs reverse hostname lookups even if you specify IP addresses in your configuration files. In your environment, in order for you to make Hadoop work, SSP-SANDBOX-1.mysite.com must resolve to the IP address of that machine, and the reverse lookup for that IP address must resolve to SSP-SANDBOX-1.mysite.com.

So you'll need to talk to whoever is administering those machines to either fudge the hosts file or to provide a DNS server that will do the right thing.

UnknownHostException on tasktracker in Hadoop cluster

Answers (1)

Related Questions