Reputation: 43
I have saved a parquet file on Spark using DataFrame.saveAsParquet()
command.
How can I delete/remove this file via python code?
Upvotes: 4
Views: 23521
Reputation: 2455
Since @bsplosion mentioned HDFS, here is how you could do it in a pySpark-script:
import subprocess
print("Deletion code:", subprocess.call(["hadoop", "fs", "-rm", "-r", "-skipTrash", "hdfs:/your/data/path"]))
# hadoop - calls hadoop
# fs - calls hadoops file system implementation
# -rm - calls the remove command
# -r - recursive removal in order to remove the entire directory
# -skipTrash - As it states: Skip the trash and directly remove everything
This returns Delection code: 0 if executed successfully, otherwise Delection code: -1.
You can read more about hadoops -rm
here in the docs.
Upvotes: 0
Reputation: 11573
This parquet "file" will actually be a directory. This answer shows how to delete a directory with files in it
import shutil
shutil.rmtree('/folder_name')
Upvotes: 5