seth127
seth127

Reputation: 2724

formats for pyspark.sql.DataFrameWriter.saveAsTable()

Does anyone know where I can find a list of available formats for the saveAsTable() function in pyspark.sql.DataFrameWriter? In the documentation it just says "the format used to save."

Every example I see uses 'parquet' but I can't find anything else mentioned. Specifically, I would like to save to Feather somehow out of pyspark.

Thank you!

Upvotes: 5

Views: 3921

Answers (1)

HBX
HBX

Reputation: 99

Hi to my knowledge the out of the box supported commands per the source code https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala for format are:

  • Parquet
  • JSON
  • orc
  • JDBC
  • text
  • csv
  • source (simply takes the source format of what you are saving)

So Feather is not supported out of the box for saveAsTable(). Depending on your setup you could try to save directly to HDFS, which would like something like :"

import feather
path= "my_data.feather" #this would then be your full hdfs URI
feather.write_dataframe(df, path)

(Taken from the feather integration tests: https://github.com/wesm/feather/blob/6b5a27c58d1e850f4eabb8c013e0976b8844eb3c/integration-tests/test_roundtrips.py)

Hopefully this was helpful, let me know if anything was wrong or unclear.

Upvotes: 3

Related Questions