Reputation: 545
I am wondering how one could customize the table settings used by
DataFrameWriter#saveAsTable
.
Is there any way to adjust the storage format (e.g. using Avro or ORC), compression (to use "snappy", etc.) and the location of the table built out of the DataFrame?
What I am looking for is the Spark2 DataFrameWriter#saveAsTable
equivalent of creating a managed Hive table with some custom settings you normally pass to the Hive CREATE TABLE
command as:
STORED AS <format>
LOCATION <hdfs_path>
TBLPROPERTIES("orc.compress"="SNAPPY")
Upvotes: 2
Views: 14089
Reputation: 1525
Below is the code to save data in different format like,
Also you can adjust different compression while saving data, below is sample code statement for same,
df.write
.format("com.databricks.spark.csv")
.option("header", "true")
.option("codec", "org.apache.hadoop.io.compress.GzipCodec")
.save("newcars.csv.gz")
==============================================================================
df.write
.format("orc")
.mode("overwrite")
.option("codec", "org.apache.hadoop.io.compress.GzipCodec")
.saveAsTable("tbl_nm")
Upvotes: 3
Reputation: 11449
Orc format
df.write.format("orc").mode("overwrite").saveAsTable("default.spark1")
Parquet format
df.write
.format("parquet")
.mode("overwrite")
.save("/home/prashant/spark-data/mental-health-in-tech-survey/parquet-data/")
Upvotes: 1