Reputation: 1529
I am currently stuck trying to set up Python code that will be able to do impala queries to a remote Impala server.
On my local Windows I am using an ODBC driver which has been set up and can successfully be used to retrieve data with Impala in Tableau. The Hadoop environment we use is Kerberised. To do succesfull connection tests using ODBC administrator we require SASL and a trusted .pem certificate.
I have already tried connecting using multiple libraries, but I'm not sure how to set connection properties and which I would need.
I tried following this guide to get started.
I experimented with pyodbc
by setting:
Driver,Host,Port,Database,AuthMech=3,UseSASL=1,UID,PWD,SSL=1
in my connection string, but I always end up with:
pyodbc.Error: ('HY000', '[HY000] [Cloudera][ImpalaODBC] (100) Error from the Impala Thrift API: No more data to read. (100) (SQLDriverConnect); [HY000] [Cloudera][ImpalaODBC] (100) Error from the Impala Thrift API: No more data to read. (100)')
I am not sure how I should set the certificate so this might be causing this error.
I also looked at impyla
, but I'm not certain on how to set the connection parameters there either.
Can someone shed any light on how to run queries from a local windows user to a kerberised Impala server? Which parameters need to be set and what values do they expect? Code examples are appreciated. I do not care which library is used although it seems that I can't install thrift-sasl
. Please ask for any required additional information and I will update my question.
Upvotes: 1
Views: 7420
Reputation: 1529
So apparently I could benefit from my previously configured ODBC driver/connection. By supplying the DSN I found in the ODBC Administrator tool I could solve my issue. I ended up using pyodbc.
import pyodbc
cfg = {'username':'...', ...}
connString = '''DSN={3};UID={0};PWD={1};Database={2}'''.format(cfg['username'],cfg['password'],cfg['database'],cfg['dsn'])
pyodbc.autocommit = True
conn = pyodbc.connect(connString, autocommit = True)
cursor = conn.cursor()
Upvotes: 4