Reputation: 2689
I have HBase & HDFS set up and working in pseudo-distributed mode (on Mac OSX). I also have a simple Java application. It works when used locally. I would like to make it work remotely. The server is hidden behind a router, all necessary ports have been forwarded.
When I try to connect remotely I get:
...
12/01/25 23:21:15 INFO zookeeper.ClientCnxn: Session establishment complete on server
remote.host.com/remoteip:53058, sessionid = 0x13516f179a30005, negotiated timeout = 40000
12/01/25 23:21:36 INFO client.HConnectionManager$HConnectionImplementation: getMaster attempt
0 of 10 failed; retrying after sleep of 1000
java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=192.168.52.53/192.168.52.53:58023]
Which to me means that Zookeeper connects but gives the client the wrong address: 1) because its local 2) because its on the wrong port
I tried fixing issue #1 by setting the remote address in HDFS core-site.xml (fs.default.name) and in hbase-site.xml (hbase.rootdir).
HDFS won't bind to the remote address. If HDFS is binded to local and works, hbase will not connect if it is given the remote one in hbase-site (the ip and port forward is working for sure, checked with telnet).
I played around with /etc/hosts - whether or not ping -c 1 $(hostname)
returns local or remote address, both HDFS & HBase start only when binded to local.
I also tried fixing issue #2 by setting hbase.master.port in hbase-site.xml - doesn't matter what I set, HBase master server binds to a random port.
I've wasted tons of time trying to get this right, checked all possible sources and tried every possible combination.
Upvotes: 3
Views: 1919
Reputation: 51319
The usual problem in this situation is that you are expecting that you can access HBase via a single IP address from outside a NAT firewall. While this is probably possible, it is very hard to set up and almost certainly unsupported.
When a client connects to HBase, the first thing that happens is they connect to ZooKeeper to determine which machine hosts the tables that they are looking for (or which machine is the current Master, if you are performing admin operations, which seems to be the case here).
Then the client connects directly to the remote machines. If the remote machines (the HBase RegionServers, specifically) are behind a NAT router and report themselves to ZooKeeper using their internal IPs, then there is no way for a machine outside of the router to resolve the IP of a RegionServer inside of the firewall.
The only reasonable way to make HBase work through NAT is to channel all outside requests through a proxy. There are two options for that- Thrift and REST. Much more on proxies here: http://ofps.oreilly.com/titles/9781449396107/clients.html
Incidentally, you almost never want this setup- all client machines should be able to communicate directly with RegionServers, so that you don't end up with a bottleneck at your HBase proxy server.
Upvotes: 1