Reputation: 682
I have used one way to save dataframe as external table using parquet file format but is there some other way to save dataframes directly as external table in hive like we have saveAsTable for managed table
Upvotes: 6
Views: 25500
Reputation: 1086
In PySpark, External Table can be created as below:
df.write.option('path','<External Table Path>').saveAsTable('<Table Name>')
Upvotes: 5
Reputation: 342
you can do this in this way
df.write.format("ORC").options(Map("path"-> "yourpath")) saveAsTable "anubhav"
Upvotes: 5
Reputation: 960
You can also save dataframe with manual create table
dataframe.registerTempTable("temp_table");
hiveSqlContext.sql("create external table
table_name if not exist as select * from temp_table");
Below mentioned link has a good explanation for create table https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-table.html
Upvotes: -2
Reputation: 1903
For external table, don't use saveAsTable
. Instead, save the data at location of the external table specified by path
. Then add partition so that it is registered with hive metadata. This will allow you to hive query by partition later.
// hc is HiveContext, df is DataFrame.
df.write.mode(SaveMode.Overwrite).parquet(path)
val sql =
s"""
|alter table $targetTable
|add if not exists partition
|(year=$year,month=$month)
|location "$path"
""".stripMargin
hc.sql(sql)
Upvotes: 3