Reputation: 1463
I have to use my local spark to connect a remote hive with authentication.
I am able to connect via beeline.
beeline> !connect jdbc:hive2://bigdatamr:10000/default Connecting to jdbc:hive2://bigdatamr:10000/default Enter username for jdbc:hive2://bigdatamr:10000/default: myusername Enter password for jdbc:hive2://bigdatamr:10000/default: ******** Connected to: Apache Hive (version 1.2.0-mapr-1703) Driver: Hive JDBC (version 1.2.0-mapr-1703) Transaction isolation: TRANSACTION_REPEATABLE_READ
How can I convert it to using spark? I tried thrift and jdbc but both not working
My trift try, don't know how to pass authentication
from pyspark.sql import SparkSession
spark = SparkSession\
.builder.master("yarn")\
.appName("my app")\
.config("hive.metastore.uris", "thrift://bigdatamr:10000")\
.enableHiveSupport()\
.getOrCreate()
My jdbc try, throw method not support
jdbcDF = spark.read \
.format("jdbc") \
.option("url", "jdbc:hive2://bigdatamr:10000") \
.option("dbtable", "default.tmp") \
.option("user", "myusername") \
.option("password", "xxxxxxx") \
.load()
Py4JJavaError: An error occurred while calling o183.load.
: java.sql.SQLException: Method not supported
Upvotes: 4
Views: 3490
Reputation: 1
Replace cloudera hive jdbc driver to overwrite official jdbc. It works.
jdbc url below: https://www.cloudera.com/downloads/connectors/hive/jdbc/2-6-15.html
I uploaded it to databricks libraries and change the connected code.
Here is my code:
sql=f"SELECT * FROM (select column from db.table where column = 'condition'"
print(sql)
print("\nget Hive data\n")
spark_df = spark.read \
.format("jdbc")\
.option("driver", "com.cloudera.hive.jdbc41.HS2Driver") \
.option("url", "url") \
.option("query", "sql") \
.load()
here is my blog
it mgiht helps you more.
Upvotes: 0
Reputation: 2924
Apparently this problem is a configuration problem.
If you have access to your server /PATH/TO/HIVE/hive-site.xml
file, copy it to your local spark configuration folder /PATH/TO/SPARK/conf/
and then retry running your application
Upvotes: 1
Reputation: 10406
You need to specify the driver you are using in the options of spark.read
:
.option("driver", "org.apache.hive.jdbc.HiveDriver")
Also, for some reason you have to specify the database in the jdbc url and the name of the table with option dbTable
. For some reason it does not work to simply define dbTable
as database.table
.
It would look like this:
jdbcDF = spark.read \
.format("jdbc") \
.option("driver", "org.apache.hive.jdbc.HiveDriver") \
.option("url", "jdbc:hive2://bigdatamr:10000/default")
.option("dbtable", "tmp") \
.option("user", "myusername") \
.option("password", "xxxxxxx") \
.load()
Upvotes: 1