Reputation: 4353
New to spark and hive. Currently I can run spark 1.5.2 and I also have access to hive from the command line. I want to be able to programmatically connect to the hive database, run a query and extract the data to a dataframe, all inside spark. I imagine this sort of workflow is pretty standard. But I have no idea how to do it.
Right now I know I can get a HiveContext in spark:
import org.apache.spark.sql.hive.HiveContext;
I can do all my querying inside hive like
SHOW TABLES;
>>customers
students
...
Then I can get data from the tables:
SELECT * FROM customers limit 100;
How do I string these 2 together inside spark?
Thanks.
Upvotes: 0
Views: 1688
Reputation: 1335
// sc is an existing SparkContext.
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
// Queries are expressed in HiveQL
val tablelist = sqlContext.sql("show tables")
val custdf = sqlContext.sql("SELECT * FROM customers limit 100")
tablelist.collect().foreach(println)
custdf.collect().foreach(println)
Upvotes: 0