Reputation: 2022
Case: I have Hive on a cloudera platform. There is a database on Hive that I want to access using python client from my computer. I read a similar SO question but its using pyhs2 which I am unable to install on the remote server. And this SO question too uses Thrift but I cant seem to install it either.
Code: After following the documentation, when I execute the following program it gives me an error.
import pyodbc, sys, os
pyodbc.autocommit=True
con = pyodbc.connect("DSN=default",driver='SQLDriverConnect',autocommit=True)
cursor = con.cursor()
cursor.execute("select * from fb_mpsp")
Error: ssh://ashish@ServerIPAddress/home/ashish/anaconda/bin/python2.7 -u /home/ashish/PyCharm_proj/hdfsConnect/home/ashish/PyCharm_proj/hdfsConnect/Hive_connect/hive_connect.py
Traceback (most recent call last): File "/home/ashish/PyCharm_proj/hdfsConnect/home/ashish/PyCharm_proj/hdfsConnect/Hive_connect/hive_connect.py", line 5, in con = pyodbc.connect("DSN=default", driver='SQLDriverConnect',autocommit=True)
pyodbc.Error: ('IM002', '[IM002] [unixODBC][Driver Manager]Data source name not found, and no default driver specified (0) (SQLDriverConnect)')
Process finished with exit code 1
Please suggest how can I solve this problem? Also I am not sure why do I have to specify the driver as SQLDriverConnect when the code will be executed using hadoop hive?
Thanks
Upvotes: 1
Views: 10759
Reputation: 1073
Try this method also to conenct and get data remotely from hive server:
connect remote server with ssh and give the cli command to access data from remote server:
ssh -o UserKnownHostsFile=/dev/null -o ConnectTimeout=90 -o StrictHostKeyChecking=no shashanks@remote_host 'hive -e "select * from DB.testtable limit 5;" >/home/shashanks/testfile'
Upvotes: 1
Reputation: 3208
This worked for me
oODBC = pyodbc.connect("DSN=Cloudera Hive DSN 64;", autocommit = True, ansi = True )
And now everything works fine.
Be sure anything is fine with you DSN using:
isql -v "Cloudera Hive DSN 64"
and replace "Cloudera Hive DSN 64" with the name you used in your odbc.ini
Also, currently I'm not able to use the kerberos authentication unless I make a ticket by hand. Impala works smoothly using kerberos keytab files
Any help about how to have hive odbc working with keytab files is appreciated.
Upvotes: 2
Reputation: 6154
If you do decide to revisit pyhs2
note that it doesn't need to be installed on the remote server, it's installed on your local client.
If you continue with pyodbc
, you need to install the ODBC driver for Hive, which you can get from Cloudera's site.
You don't need to specify the driver in your connection, it should be part of your DSN. The specifics of creating the DSN depend on your OS, but essentially you will create it using Administrative Tools -> Data Sources (Windows), install ODBC and edit /Library/ODBC/odbc.ini
(Mac), or edit /etc/odbc.ini
(Linux).
Conceptually, think of the DSN as a specification that represents all the information about the connection - it will contain the host, port, and driver information. That way in your code you don't have to specify these things and you can switch details about the database without changing your code.
# Note only the DSN name specifies the connection
import pyodbc
conn = pyodbc.connect("DSN=Hive1")
cursor = conn.cursor()
cursor.execute("select * from YYY")
Alternatively, I've updated the other question you referenced with information about how to install the thrift libraries. I think that's the way to go, if you have that option.
Upvotes: 1