Icarus
Icarus

Reputation: 1463

How to connect remote hive from spark with authentication

I have to use my local spark to connect a remote hive with authentication.

I am able to connect via beeline.

beeline> !connect jdbc:hive2://bigdatamr:10000/default Connecting to jdbc:hive2://bigdatamr:10000/default Enter username for jdbc:hive2://bigdatamr:10000/default: myusername Enter password for jdbc:hive2://bigdatamr:10000/default: ******** Connected to: Apache Hive (version 1.2.0-mapr-1703) Driver: Hive JDBC (version 1.2.0-mapr-1703) Transaction isolation: TRANSACTION_REPEATABLE_READ

How can I convert it to using spark? I tried thrift and jdbc but both not working

My trift try, don't know how to pass authentication

from pyspark.sql import SparkSession
spark = SparkSession\
    .builder.master("yarn")\
    .appName("my app")\
    .config("hive.metastore.uris", "thrift://bigdatamr:10000")\
    .enableHiveSupport()\
    .getOrCreate()

My jdbc try, throw method not support

jdbcDF = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:hive2://bigdatamr:10000") \
    .option("dbtable", "default.tmp") \
    .option("user", "myusername") \
    .option("password", "xxxxxxx") \
    .load()
Py4JJavaError: An error occurred while calling o183.load.

: java.sql.SQLException: Method not supported

Upvotes: 4

Views: 3490

Answers (3)

Bruce Ye
Bruce Ye

Reputation: 1

Replace cloudera hive jdbc driver to overwrite official jdbc. It works.

jdbc url below: https://www.cloudera.com/downloads/connectors/hive/jdbc/2-6-15.html

I uploaded it to databricks libraries and change the connected code.

Here is my code:

  sql=f"SELECT * FROM  (select column from db.table where column = 'condition'"
  print(sql)
  print("\nget Hive data\n")
  spark_df = spark.read \
                .format("jdbc")\
                .option("driver", "com.cloudera.hive.jdbc41.HS2Driver") \
                .option("url", "url") \
                .option("query", "sql") \
                .load()

here is my blog

https://blog.8owe.com/

it mgiht helps you more.

Upvotes: 0

user1314742
user1314742

Reputation: 2924

Apparently this problem is a configuration problem.

If you have access to your server /PATH/TO/HIVE/hive-site.xml file, copy it to your local spark configuration folder /PATH/TO/SPARK/conf/ and then retry running your application

Upvotes: 1

Oli
Oli

Reputation: 10406

You need to specify the driver you are using in the options of spark.read:

.option("driver", "org.apache.hive.jdbc.HiveDriver")

Also, for some reason you have to specify the database in the jdbc url and the name of the table with option dbTable. For some reason it does not work to simply define dbTable as database.table.

It would look like this:

jdbcDF = spark.read \
    .format("jdbc") \
    .option("driver", "org.apache.hive.jdbc.HiveDriver") \
    .option("url", "jdbc:hive2://bigdatamr:10000/default")
    .option("dbtable", "tmp") \
    .option("user", "myusername") \
    .option("password", "xxxxxxx") \
    .load()

Upvotes: 1

Related Questions