SPARK 1.6 Insert into existing Hive table (non-partitioned)

Question

Given that I can get these singleton insert statements below to work as from another stack overflow question (thanks), then

  val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
  sqlContext.sql("CREATE TABLE IF NOT EXISTS e360_models.employee(id INT, name STRING, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '
'")

   sqlContext.sql("insert into table e360_models.employee select t.* from (select 1210, 'rahul', 55) t")
   sqlContext.sql("insert into table e360_models.employee select t.* from (select 1211, 'sriram pv', 35) t")
   sqlContext.sql("insert into table e360_models.employee select t.* from (select 1212, 'gowri', 59) t")

   val result = sqlContext.sql("FROM e360_models.employee SELECT id, name, age")
   result.show()

what about if wanting to do an insert select from a SPARK DF registered as temporary table, to an already existing Hive table? I cannot seem to get it to work. Is it in fact possible?

Using 1.6 SPARK. Not interested in creating a table a la CTAS, but rather inserting to as per the above, but in bulk, e.g.

sqlContext.sql("INSERT INTO TABLE default.ged_555 SELECT t.* FROM mytempTable t")

Ram Ghadiyaram · Accepted Answer

As I understood you want to insert some data in to e360_models.employee and then you want to select some columns and again insert in to default.ged_555 and also you don't want to do CTAS Prepare a dataframe from e360_models.employee and then do like below

// since you are using hive I used hiveContext below... 
 val dataframe = hiveContext.sql("select * from e360_models.employee ");

df.show(10) // to verify whether data is there in dataframe or not



df.printSchema(); // print schema as well for debug purpose.
    dataframe.write.mode(SaveMode.OverWrite).insertInto("default.ged_555")

val sampleDataFrame = hiveContext.sql("select * from default.get_555");

// again do print 10 records to verify your result for debug purpose
sampleDataFrame.show()
// again print schema of the target table
sampleDataFrame.printSchema()

SPARK 1.6 Insert into existing Hive table (non-partitioned)

Answers (1)

Related Questions