Reputation: 39009
I've been stuck on this for a few hours and am having trouble making any progress. I have a remote Hadoop instance with Hue server I've been running Hive queries against. These work fine. I've been hoping to run the queries against hive directly through Python now, but this is where my problems arise. I've tried running things through both Python Hive Utils and pyhs2. The former gives me:
thrift.Thrift.TApplicationException: Invalid method name: 'get_database'
The latter just times out.
I know the server is using 0.10.0-cdh4.3.0, but I don't know how to tell if it's using HiveServer or HiveServer2.
So, my question is three-fold:
Upvotes: 3
Views: 3085
Reputation: 2893
You can just use ODBC instead.
in phyton :
import pyodbc
cnxn = pyodbc.connect("DSN=XXX",autocommit=True)
cursor = cnxn.cursor()
cursor.execute("select * from YYY")
Where XXX
is a previously created DSN..
For the drivers go here
When defining the the DSN you have to set the port (default 10000) and whether its HiveServer1 or 2.
To know if its 1 or 2 you need access to the server and check what process listens on that relevant port. (netstat
will give you the process number and the port and jps -m
will give you the process number and whether its HiveServer1 or 2)
Upvotes: 2