Reputation: 422
Overwriting a file in PySpark, without affecting others.
I need to save a dataframe as a parquet file. If a directory for a given file already exists, I need to overwrite it, but upper subdirectories should not be ovewritten.
Example:
root/2021/12/01/file1.parquet
root/2021/12/02/file2.parquet
root/2021/12/03/file3.parquet
If /2021/12/01/file1.parquet is being re-created (or overwritten), the other two files in the root remain as-is. Path /2021/12 is part of the partition structure of these files. Hence, .mode("overwrite") will overwrite the other two files as file1 is being re-created.
How can this be accomplished in PySpark?
Upvotes: 0
Views: 339