codeogeek
codeogeek

Reputation: 682

save dataframe as external hive table

I have used one way to save dataframe as external table using parquet file format but is there some other way to save dataframes directly as external table in hive like we have saveAsTable for managed table

Upvotes: 6

Views: 25500

Answers (4)

Ankur
Ankur

Reputation: 1086

In PySpark, External Table can be created as below:

df.write.option('path','<External Table Path>').saveAsTable('<Table Name>')

Upvotes: 5

anubhav
anubhav

Reputation: 342

you can do this in this way

df.write.format("ORC").options(Map("path"-> "yourpath")) saveAsTable "anubhav"

Upvotes: 5

Noman Khan
Noman Khan

Reputation: 960

You can also save dataframe with manual create table

dataframe.registerTempTable("temp_table");
hiveSqlContext.sql("create external table 
   table_name if not exist as select * from temp_table");

Below mentioned link has a good explanation for create table https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-table.html

Upvotes: -2

ruhong
ruhong

Reputation: 1903

For external table, don't use saveAsTable. Instead, save the data at location of the external table specified by path. Then add partition so that it is registered with hive metadata. This will allow you to hive query by partition later.

// hc is HiveContext, df is DataFrame. df.write.mode(SaveMode.Overwrite).parquet(path) val sql = s""" |alter table $targetTable |add if not exists partition |(year=$year,month=$month) |location "$path" """.stripMargin hc.sql(sql)

Upvotes: 3

Related Questions