Reputation: 11
We have Python3 application to connect to Hbase and fetch data.
The connectivity was working fine with Kerberos Hbase Thrift Binary protocol (in TSocket) until the Hadoop team moved the Hadoop system to Cloudera and Cloudera manager which start Kerberos Hbase Thrift in HTTPS mode.
Now the protocol changed from TSocket to HTTP/HTPS and Python code cannot authenticate using HTTP Client with SASL kerberos.
Current Python version used ins Python 3.6.8
and package versions are
thrift=0.13.0
hbase-thrift=0.20.4
pure_sasl=0.5.1
Working code in TSocket mode:
############
from thrift.transport import TSocket,TTransport
from thrift.protocol import TBinaryProtocol
from hbase import Hbase
from hbase.ttypes import *
import jprops
from subprocess import call, check_output
#read cluster.properties
with open('/data/properties/cluster.properties') as fp:
properties = jprops.load_properties(fp)
# kerberos ticket
kerberos_ticket():
principal = properties["principal"]
kinitCommand = "kinit" + " " + "-kt"+ " " + keyTab + " " + principal
call(kinitCommand, shell="True")
return
# Hbase connection
def hbase_connection():
#get hbase data
thriftHost = properties["thriftHost"]
hbaseService = properties["hbaseService"]
Tsock = TSocket.TSocket(thriftHost, 9090)
Tsock.setTimeout(2000000) #Milliseconds timeout
transport = TTransport.TSaslClientTransport(
Tsock,
host=thriftHost,
service=hbaseService,
mechanism='GSSAPI'
)
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = Hbase.Client(protocol)
return client,transport
#get kerberized ticket
kerberos_ticket()
client,transport = hbase_connection()
transport.open()
print(client.getTableNames())
###########
I found that in the TTransport.py code there was a comment it just supports TSocket
https://github.com/apache/thrift/blob/master/lib/py/src/transport/TTransport.py
TTransport.TSaslClientTransport
"transport: an underlying transport to use, typically just a TSocket"
We tried to use
https://github.com/apache/thrift/blob/master/lib/py/src/transport/THttpClient.py
THttpClient.THttpClient(url)
but it cannot be used in TTransport.TSaslClientTransport for SASL kerberos.
Please help to suggest if Python cannot be used in Cloudera managed Kerberos Hbase thrift HTTPS and any alternative method to connect Hbase (Kerberos) using Python.
PS: I went through this link with a similar issue but had no concrete solution
Python program to connect to HBase via thrift server in Http mode
Thanks in advance,
Manjil
Upvotes: 0
Views: 1235
Reputation: 11
I have found the solution to use kerberos with hbase http mode.
### Python code
# Prerequsite kinit and kerberos ticket is available for the user
# Hbase thrift running in http protocol secure mode
# Python code to use local kerberos ticket local cache
# add kerberos context in http header
# perform hbase client operation like get table , table scan etc
#
# Important: the httpClient transport opened session will be available only for one time call,
# for next hbs operation need get new kerberos context (krb_context) by adding header and open session
##
import kerberos
from thrift import Thrift
from thrift.transport import THttpClient
from thrift.protocol import TBinaryProtocol
from hbase.Hbase import Client
import ssl
def kerberos_auth():
hbaseService="<hbase>/<HOST>@<DOMAIN.COM>"
#service can hbase ot HTTP based on hbase thrift configuration
clientPrincipal="<user>@<DOMAIN.COM>"
__, krb_context = kerberos.authGSSClientInit(hbaseService, principal=clientPrincipal)
kerberos.authGSSClientStep(krb_context, "")
negotiate_details = kerberos.authGSSClientResponse(krb_context)
headers = {'Authorization': 'Negotiate ' + negotiate_details,'Content-Type':'application/binary'}
return headers
httpClient = THttpClient.THttpClient('https://<THRIFT_HOST>:9090/', cert_file='<client cert file path>.crt',key_file='<client cert key file path>.key', ssl_context=ssl._create_unverified_context())
# if no ssl verification is required
# for new session start
httpClient.setCustomHeaders(headers=kerberos_auth())
protocol = TBinaryProtocol.TBinaryProtocol(httpClient)
httpClient.open()
client = Client(protocol)
# for new session end
client.getTableNames()
Thank you,
Manjil
Upvotes: 1