breezymri
breezymri

Reputation: 4353

how to set up hive database connection inside spark

New to spark and hive. Currently I can run spark 1.5.2 and I also have access to hive from the command line. I want to be able to programmatically connect to the hive database, run a query and extract the data to a dataframe, all inside spark. I imagine this sort of workflow is pretty standard. But I have no idea how to do it.

Right now I know I can get a HiveContext in spark:

import org.apache.spark.sql.hive.HiveContext;

I can do all my querying inside hive like

SHOW TABLES; 
>>customers
  students
  ...

Then I can get data from the tables:

SELECT * FROM customers limit 100;

How do I string these 2 together inside spark?

Thanks.

Upvotes: 0

Views: 1688

Answers (1)

Arvind Kumar
Arvind Kumar

Reputation: 1335

// sc is an existing SparkContext.

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

// Queries are expressed in HiveQL

val tablelist = sqlContext.sql("show tables")
val custdf = sqlContext.sql("SELECT * FROM customers limit 100") 

tablelist.collect().foreach(println)     
custdf.collect().foreach(println)

Upvotes: 0

Related Questions