Reputation: 545

Spark DataFrame saveAsTable:

I am wondering how one could customize the table settings used by DataFrameWriter#saveAsTable.

Is there any way to adjust the storage format (e.g. using Avro or ORC), compression (to use "snappy", etc.) and the location of the table built out of the DataFrame?

What I am looking for is the Spark2 DataFrameWriter#saveAsTable equivalent of creating a managed Hive table with some custom settings you normally pass to the Hive CREATE TABLE command as:

STORED AS <format>
LOCATION <hdfs_path>
TBLPROPERTIES("orc.compress"="SNAPPY")

Upvotes: 2

Answers (2)

Ajay Kharade

Reputation: 1525

Below is the code to save data in different format like,

CSV
Parquet
Avro
orc
Json

Also you can adjust different compression while saving data, below is sample code statement for same,

df.write
    .format("com.databricks.spark.csv")
    .option("header", "true")
    .option("codec", "org.apache.hadoop.io.compress.GzipCodec")
    .save("newcars.csv.gz")

==============================================================================

df.write
    .format("orc")
    .mode("overwrite")
    .option("codec", "org.apache.hadoop.io.compress.GzipCodec")
    .saveAsTable("tbl_nm")

Upvotes: 3

vaquar khan

Reputation: 11449

Orc format

  df.write.format("orc").mode("overwrite").saveAsTable("default.spark1")

Parquet format

df.write
    .format("parquet")
    .mode("overwrite")
    .save("/home/prashant/spark-data/mental-health-in-tech-survey/parquet-data/")

Upvotes: 1

Spark DataFrame saveAsTable:

Answers (2)

Related Questions