bda
bda

Reputation: 422

Overwriting a file in PySpark, without affecting others

Overwriting a file in PySpark, without affecting others.

I need to save a dataframe as a parquet file. If a directory for a given file already exists, I need to overwrite it, but upper subdirectories should not be ovewritten.

Example:

root/2021/12/01/file1.parquet
root/2021/12/02/file2.parquet
root/2021/12/03/file3.parquet

If /2021/12/01/file1.parquet is being re-created (or overwritten), the other two files in the root remain as-is. Path /2021/12 is part of the partition structure of these files. Hence, .mode("overwrite") will overwrite the other two files as file1 is being re-created.

How can this be accomplished in PySpark?

Upvotes: 0

Views: 339

Answers (1)

SherKhan
SherKhan

Reputation: 96

df.write.mode("overwrite").parquet("/tmp/output/people.parquet")

Upvotes: -1

Related Questions