Reputation: 41
I am writing a Java Spark application that needs to connect to hive and get some basic table info and query that table for data. I am creating a spark session and getting info like below. But this uses thrift server. I want to see if I can do the same without using thrift server. Is that possible and how do I do it? I am trying to write a JDBC client that that can connect to spark via sparkSQL to access hive tables but without using thrift server. Please provide your thoughts and suggestions on how to approach this. Thank you.
SparkSession spark = SparkSession
.builder()
.appName(" Hive example")
.enableHiveSupport()
.getOrCreate();
Dataset<Row> df = spark.read()
.format("jdbc")
.option("driver", "org.apache.hive.jdbc.HiveDriver")
.option("url", " jdbc:hive2://host:port")
.option("dbtable", "mytable")
.option("fetchsize", "20")
.load();
df.show();
Upvotes: 1
Views: 1550
Reputation: 2855
With Spark 2 you can try something like this,
SparkSession ss = SparkSession
.builder()
.appName(" Hive example")
.config("hive.metastore.uris", "thrift://localhost:9083")
.enableHiveSupport()
.getOrCreate();
Note the hive.metastore.uris
property, change localhost to point to you sandbox or cluster.
one ss
is initialised, you can read tables like below,
val df = ss.read.table("db_name.table_name")
JDBC way:
spark.read
.format("jdbc")
.option("url", "jdbc:hive2://localhost:10000/default")
.option("dbtable", "clicks_json")
.load()
Hope this helps. Cheers.
Upvotes: 1