Peter G. Horvath
Peter G. Horvath

Reputation: 545

Spark DataFrame saveAsTable:

I am wondering how one could customize the table settings used by DataFrameWriter#saveAsTable.

Is there any way to adjust the storage format (e.g. using Avro or ORC), compression (to use "snappy", etc.) and the location of the table built out of the DataFrame?

What I am looking for is the Spark2 DataFrameWriter#saveAsTable equivalent of creating a managed Hive table with some custom settings you normally pass to the Hive CREATE TABLE command as:

Upvotes: 2

Views: 14089

Answers (2)

Ajay Kharade
Ajay Kharade

Reputation: 1525

Below is the code to save data in different format like,

  1. CSV
  2. Parquet
  3. Avro
  4. orc
  5. Json

Also you can adjust different compression while saving data, below is sample code statement for same,

df.write
    .format("com.databricks.spark.csv")
    .option("header", "true")
    .option("codec", "org.apache.hadoop.io.compress.GzipCodec")
    .save("newcars.csv.gz")

==============================================================================

df.write
    .format("orc")
    .mode("overwrite")
    .option("codec", "org.apache.hadoop.io.compress.GzipCodec")
    .saveAsTable("tbl_nm")

Upvotes: 3

vaquar khan
vaquar khan

Reputation: 11449

Orc format

  df.write.format("orc").mode("overwrite").saveAsTable("default.spark1")

Parquet format

df.write
    .format("parquet")
    .mode("overwrite")
    .save("/home/prashant/spark-data/mental-health-in-tech-survey/parquet-data/")                                   

Upvotes: 1

Related Questions