guptashail
guptashail

Reputation: 43

How to delete a Parquet file on Spark?

I have saved a parquet file on Spark using DataFrame.saveAsParquet() command.

How can I delete/remove this file via python code?

Upvotes: 4

Views: 23521

Answers (2)

Markus
Markus

Reputation: 2455

Since @bsplosion mentioned HDFS, here is how you could do it in a pySpark-script:

import subprocess

print("Deletion code:", subprocess.call(["hadoop", "fs", "-rm", "-r", "-skipTrash", "hdfs:/your/data/path"]))

# hadoop     - calls hadoop
# fs         - calls hadoops file system implementation
# -rm        - calls the remove command
# -r         - recursive removal in order to remove the entire directory
# -skipTrash - As it states: Skip the trash and directly remove everything

This returns Delection code: 0 if executed successfully, otherwise Delection code: -1. You can read more about hadoops -rm here in the docs.

Upvotes: 0

David
David

Reputation: 11573

This parquet "file" will actually be a directory. This answer shows how to delete a directory with files in it

import shutil
shutil.rmtree('/folder_name')

Upvotes: 5

Related Questions