Reputation: 1547
I'm processing HBase tables from Spark (EMR, in Yarn mode). Actually, PySpark - I don't think it is important. I call HBase through separate Thrift service from outside of the HBase cluster.
It looks like I was able to connect to the Thrift servers but I have some issue with ZooKeeper (because of the error point me to ZooKeeper port 2181).
Why does that happen and how can I fix that?
17/08/02 20:21:31 INFO ZooKeeper: Client environment:java.io.tmpdir=/tmp
17/08/02 20:21:31 INFO ZooKeeper: Client environment:java.compiler=<NA>
17/08/02 20:21:31 INFO ZooKeeper: Client environment:os.name=Linux
17/08/02 20:21:31 INFO ZooKeeper: Client environment:os.arch=amd64
17/08/02 20:21:31 INFO ZooKeeper: Client environment:os.version=4.4.35-33.55.amzn1.x86_64
17/08/02 20:21:31 INFO ZooKeeper: Client environment:user.name=hadoop
17/08/02 20:21:31 INFO ZooKeeper: Client environment:user.home=/home/hadoop
17/08/02 20:21:31 INFO ZooKeeper: Client environment:user.dir=/home/hadoop/data
17/08/02 20:21:31 INFO ZooKeeper: Initiating client connection, connectString=thrift-internal.production.k8s.prod.node.io:2181 sessionTimeout=180000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@2818bc0e
17/08/02 20:21:31 INFO ClientCnxn: Opening socket connection to server ip-172-23-115-152.us-west-2.compute.internal/172.23.115.152:2181. Will not attempt to authenticate using SASL (unknown error)
Upvotes: 1
Views: 786
Reputation: 1547
As an HBase client, you have to connect to both: HBase service (directly or through Thrift) and ZooKeeper service (which usually runs on the same server as HBase Master).
When you connect to HBase using Thrift servers the library uses the same host
address to communicate to ZooKeeper.
hbase = happybase.Connection(host, port=port, timeout=10000)
However, this ZooKeeper address is not correct if Thrift servers work on a separate hardware/IPs.
So, you have to connect to Thrift using the regular code
hbase = happybase.Connection(host, port=port, timeout=10000)
but specify HBaseHost
(ZooKeeper) when you connect to a table by hbase.zookeeper.quorum
parameter:
conf = {"hbase.zookeeper.quorum": HBaseHost, "hbase.mapreduce.inputtable": table}
rdd = spark_context.newAPIHadoopRDD(
"org.apache.hadoop.hbase.mapreduce.TableInputFormat",
"org.apache.hadoop.hbase.io.ImmutableBytesWritable",
"org.apache.hadoop.hbase.client.Result",
keyConverter=keyConv,
valueConverter=valueConv,
conf=conf
)
The ZooKeeper address might be also specified in hbase-site.xml
as hbase.zookeeper.quorum
property. Then you need to include this config file in your's HBase client settings.
Upvotes: 1
Reputation: 9
Do you have access to your cluster manager ex. Cloudera manager. You can check if zookeeper service is running properly or any error messages popping up there.
You can check by sudo service zookeeper status
or you can also telnet zookeeper host by:
root@host:~# telnet localhost 2181 Trying 127.0.0.1... Connected to myhost. Escape character is '^]'. stats Zookeeper version: 3.4.3-cdh4.0.1--1, built on 06/28/2012 23:59 GMT
If you are running in standalone mode its a JVM process and you can check status by 'jps'
will display list of jvm processes; something like this for zookeeper with process ID HQuorumPeer
Upvotes: 0