Reputation: 18108
Given that I can get these singleton insert statements below to work as from another stack overflow question (thanks), then
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql("CREATE TABLE IF NOT EXISTS e360_models.employee(id INT, name STRING, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'")
sqlContext.sql("insert into table e360_models.employee select t.* from (select 1210, 'rahul', 55) t")
sqlContext.sql("insert into table e360_models.employee select t.* from (select 1211, 'sriram pv', 35) t")
sqlContext.sql("insert into table e360_models.employee select t.* from (select 1212, 'gowri', 59) t")
val result = sqlContext.sql("FROM e360_models.employee SELECT id, name, age")
result.show()
what about if wanting to do an insert select from a SPARK DF registered as temporary table, to an already existing Hive table? I cannot seem to get it to work. Is it in fact possible?
Using 1.6 SPARK. Not interested in creating a table a la CTAS, but rather inserting to as per the above, but in bulk, e.g.
sqlContext.sql("INSERT INTO TABLE default.ged_555 SELECT t.* FROM mytempTable t")
Upvotes: 3
Views: 2755
Reputation: 29237
As I understood you want to insert some data in to
e360_models.employee
and then you want to select some columns and again insert in todefault.ged_555
and also you don't want to do CTAS Prepare a dataframe frome360_models.employee
and then do like below
// since you are using hive I used hiveContext below...
val dataframe = hiveContext.sql("select * from e360_models.employee ");
df.show(10) // to verify whether data is there in dataframe or not
df.printSchema(); // print schema as well for debug purpose.
dataframe.write.mode(SaveMode.OverWrite).insertInto("default.ged_555")
val sampleDataFrame = hiveContext.sql("select * from default.get_555");
// again do print 10 records to verify your result for debug purpose
sampleDataFrame.show()
// again print schema of the target table
sampleDataFrame.printSchema()
Upvotes: 1