Ken
Ken

Reputation: 33

HBase HMaster fails to start due to Zookeeper UnknownHostException

Running HBase 2.0.4 with Hadoop 2.8.5 on Centos 7, with 1 Master node, 4 Slave nodes. I've tried the same setup with HBase 2.1.3, and the same problem occurs.

The HMaster fails to start due to Zookeeper not resolving HRegionservers, as seen from this error log.

2019-03-29 13:58:34,961 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=node0.ken:2181, node1.ken:2181, node2.ken:2181, node3.ken:2181, node4.ken:21
81 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@165e389b
2019-03-29 13:58:34,965 WARN  [main] zookeeper.RecoverableZooKeeper: Unable to create ZooKeeper Connection
java.net.UnknownHostException:  node1.ken: Name or service not known
        at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
        at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
        at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
        at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
        at java.net.InetAddress.getAllByName(InetAddress.java:1193)
        at java.net.InetAddress.getAllByName(InetAddress.java:1127)
        at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)
        at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
        at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)
        at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.checkZk(RecoverableZooKeeper.java:131)

My config files look as follows:


---- hbase-site.xml ----

<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://node0.ken:9000/hbase</value>
  </property>

  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>

  <property>
    <name>zookeeper.session.timeout</name>
    <value>1200000</value>
  </property>

  <property>
    <name>hbase.zookeeper.session.timeout</name>
    <value>1200000</value>
  </property>

  <property>
    <name>hbase.zookeeper.property.tickTime</name>
    <value>6000</value>
  </property>

  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>hdfs://node0.ken:9000/zookeeper</value>
  </property>

  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>node0.ken, node1.ken, node2.ken, node3.ken, node4.ken</value>
  </property>

  <property>
    <name>hbase.zookeeper.property.clientPort</name>
    <value>2181</value>
  </property>

  <property>
    <name>zookeeper.znode.parent</name>
    <value>/node0.ken</value>
  </property>
</configuration>

---- regionservers ----

node1.ken
node2.ken
node3.ken
node4.ken

---- /etc/hosts ----

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

10.158.57.150  node0.ken node0 master-node
10.158.57.151  node1.ken node1
10.158.57.152  node2.ken node2
10.158.57.153  node3.ken node3
10.158.57.154  node4.ken node4

All hosts are pingable from each other, SELinux and firewalld are disabled, I am able to successfully telnet node1:2181 from all the other nodes, and I've already tried the steps suggested here, but Zookeeper still fails to resolve: https://jayunit100.blogspot.com/2013/05/debugging-hbase-installation.html

Am I missing something? Where else does Zookeeper pull its host resolution from?


UPDATE: 2019-03-29

The problem seems to be the Zookeeper client that HBase uses (zookeeper.version=3.4.10), and might be related to this bug: https://issues.apache.org/jira/browse/ZOOKEEPER-2982 Does anyone know how to replace the Zookeeper client HBase uses with a more updated one?

UPDATE: 2019-04-01 I tried replacing hbase/lib/zookeeper-3.4.10.jar with hbase/lib/zookeeper-3.4.13.jar, outputs the same error, just from a different API call:

2019-03-29 19:09:46,880 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=node0.ken:2181, node1.ken:2181, node2.ken:2181, node3.ken:2181, node4.ken:2181 sessionTimeout=1200000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@336880df
2019-03-29 19:09:46,912 INFO  [main-SendThread( node1.ken:2181)] zookeeper.ClientCnxn: Opening socket connection to server  node1.ken:2181. Will not attempt to authenticate using SASL (unknown error)
2019-03-29 19:09:46,917 WARN  [main-SendThread( node1.ken:2181)] zookeeper.ClientCnxn: Session 0x0 for server  node1.ken:2181, unexpected error, closing socket connection and attempting reconnect
java.nio.channels.UnresolvedAddressException
        at sun.nio.ch.Net.checkAddress(Net.java:101)
        at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)
        at org.apache.zookeeper.ClientCnxnSocketNIO.registerAndConnect(ClientCnxnSocketNIO.java:277)
        at org.apache.zookeeper.ClientCnxnSocketNIO.connect(ClientCnxnSocketNIO.java:287)
        at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1021)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1064)

I tried compiling a small Java class to test these functions:

import java.net.InetAddress;
import java.net.InetSocketAddress;
import java.net.SocketAddress;
import java.net.Socket;
import sun.nio.ch.Net;
import java.net.UnknownHostException;
import java.io.IOException;
import java.util.Arrays;
import java.util.List;

public class GetIPHostname {

    public static void main(String[] args) {

        InetAddress ip;
        String hostname;
        try {
            ip = InetAddress.getLocalHost();
            hostname = ip.getHostName();
            System.out.println("Your current IP address : " + ip);
            System.out.println("Your current Hostname : " + hostname);
            //List<String> hostname_list = Arrays.asList("node0", "node1", "node2", "node3", "node4");
            List<String> hostname_list = Arrays.asList("node0.ken", "node1.ken", "node2.ken", "node3.ken", "node4.ken");

            for (String cur_hostname : hostname_list) {
                String ip_address = InetAddress.getByName(cur_hostname).getHostAddress();

                System.out.println("Hostname resolved: "+cur_hostname+" -> "+ip_address);
                final Socket socket = new Socket();
                SocketAddress address = new InetSocketAddress(cur_hostname, 2181);
                try {
                        InetSocketAddress isa = Net.checkAddress(address);
                        System.out.println("ISA: " +isa.getAddress()+ " -> " +isa.getPort());

                        InetAddress[] iadresses= InetAddress.getAllByName(cur_hostname);
                        for (InetAddress cur_ia : iadresses) {
                            System.out.println("InetAddress: " + cur_ia.getHostAddress());
                        }

                        socket.connect(address);
                        socket.close();
                } catch (IOException e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                }// To connect to remote host
            }

        } catch (UnknownHostException e) {

            e.printStackTrace();
        }
    }
}

... and both APIs are able to resolve the addresses using the hosts file and even connect to the Zookeeper servers at port 2181:

[root@node1 test]# java GetIPHostname
Your current IP address : node1.ken/10.158.57.151
Your current Hostname : node1.ken
Hostname resolved: node0.ken -> 10.158.57.150
ISA: node0.ken/10.158.57.150 -> 2181
InetAddress: 10.158.57.150
Hostname resolved: node1.ken -> 10.158.57.151
ISA: node1.ken/10.158.57.151 -> 2181
InetAddress: 10.158.57.151
Hostname resolved: node2.ken -> 10.158.57.152
ISA: node2.ken/10.158.57.152 -> 2181
InetAddress: 10.158.57.152
Hostname resolved: node3.ken -> 10.158.57.153
ISA: node3.ken/10.158.57.153 -> 2181
InetAddress: 10.158.57.153
Hostname resolved: node4.ken -> 10.158.57.154
ISA: node4.ken/10.158.57.154 -> 2181
InetAddress: 10.158.57.154

Upvotes: 2

Views: 1094

Answers (1)

qing huang
qing huang

Reputation: 11

<property>
    <name>hbase.zookeeper.quorum</name>
    <value>node0.ken, node1.ken, node2.ken, node3.ken, node4.ken</value>
</property>

try to removing the spacing of node0.ken, node1.ken,... to

<property>
    <name>hbase.zookeeper.quorum</name>
    <value>node0.ken,node1.ken,node2.ken,node3.ken,node4.ken</value>
</property>

Upvotes: 1

Related Questions