6:[["$","$Le",null,{}],["$","div",null,{"className":"min-h-screen bg-gray-100 p-6","children":[["$","$Lf",null,{}],["$","script",null,{"type":"application/ld+json","dangerouslySetInnerHTML":{"__html":"{\"@context\":\"https://schema.org\",\"@type\":\"QAPage\",\"mainEntity\":{\"@type\":\"Question\",\"name\":\"How to delete a Parquet file on Spark?\",\"text\":\"

I have saved a parquet file on Spark using DataFrame.saveAsParquet() command.

\\n\\n

How can I delete/remove this file via python code?

\\n\",\"author\":{\"@type\":\"Person\",\"name\":\"guptashail\"},\"upvoteCount\":4,\"answerCount\":2,\"acceptedAnswer\":{\"@type\":\"Answer\",\"text\":\"

This parquet \\\"file\\\" will actually be a directory. This answer shows how to delete a directory with files in it

\\n\\n

import shutil\\nshutil.rmtree('/folder_name')\\n

\\n\",\"author\":{\"@type\":\"Person\",\"name\":\"David\"},\"upvoteCount\":5}}}"}}],["$","div",null,{"className":"bg-white shadow-md rounded-lg p-6 mb-6 relative","children":[["$","div",null,{"className":"absolute top-4 right-4 flex flex-wrap space-x-2","children":[["$","span","python",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/python/1","children":"python"}]}],["$","span","apache-spark",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/apache-spark/1","children":"apache-spark"}]}],["$","span","parquet",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/parquet/1","children":"parquet"}]}]]}],["$","div",null,{"className":"flex items-center mb-4","children":[["$","img",null,{"src":"https://www.gravatar.com/avatar/c3e1563b01209bc2a46d24f9cd6ac95b?s=256&d=identicon&r=PG&f=y&so-version=2","alt":"guptashail","className":"w-16 h-16 rounded-full border"}],["$","div",null,{"className":"ml-4","children":[["$","a",null,{"href":"https://stackoverflow.com/users/6420021/guptashail","target":"_blank","rel":"noopener noreferrer","className":"text-lg font-semibold text-blue-600 hover:underline","children":"guptashail"}],["$","p",null,{"className":"text-sm text-gray-500","children":["Reputation: ",43]}]]}]]}],["$","h1",null,{"className":"text-2xl font-bold text-gray-800 mb-4","children":"How to delete a Parquet file on Spark?"}],["$","p",null,{"className":"text-gray-700 mt-4","dangerouslySetInnerHTML":{"__html":"

I have saved a parquet file on Spark using DataFrame.saveAsParquet() command.

\n\n

How can I delete/remove this file via python code?

\n"}}],["$","div",null,{"className":"text-gray-600 text-sm mt-4","children":[["$","p",null,{"children":["Upvotes: ",4]}],["$","p",null,{"children":["Views: ",23647]}]]}]]}],["$","div",null,{"className":"container mx-auto","children":[["$","h2",null,{"className":"text-2xl font-semibold text-gray-800 mb-6","children":["Answers (",2,")"]}],[["$","div","61818413",{"className":"bg-white shadow-md rounded-lg p-6 mb-6","children":[["$","div",null,{"className":"flex items-center mb-4","children":[["$","img",null,{"src":"https://i.sstatic.net/miE8t.png?s=256","alt":"Markus","className":"w-12 h-12 rounded-full border"}],["$","div",null,{"className":"ml-4","children":[["$","a",null,{"href":"https://stackoverflow.com/users/3757672/markus","target":"_blank","rel":"noopener noreferrer","className":"text-lg font-semibold text-blue-600 hover:underline","children":"Markus"}],["$","p",null,{"className":"text-sm text-gray-500","children":["Reputation: ",2455]}]]}]]}],["$","p",null,{"className":"text-gray-700 mb-4","dangerouslySetInnerHTML":{"__html":"

Since @bsplosion mentioned HDFS, here is how you could do it in a pySpark-script:

\n\n

import subprocess\n\nprint(\"Deletion code:\", subprocess.call([\"hadoop\", \"fs\", \"-rm\", \"-r\", \"-skipTrash\", \"hdfs:/your/data/path\"]))\n\n# hadoop     - calls hadoop\n# fs         - calls hadoops file system implementation\n# -rm        - calls the remove command\n# -r         - recursive removal in order to remove the entire directory\n# -skipTrash - As it states: Skip the trash and directly remove everything\n

\n\n

This returns Delection code: 0 if executed successfully, otherwise Delection code: -1.\nYou can read more about hadoops -rm here in the docs.

\n"}}],["$","div",null,{"className":"text-gray-600 text-sm","children":["$","p",null,{"children":["Upvotes: ",0]}]}]]}],["$","div","37617472",{"className":"bg-white shadow-md rounded-lg p-6 mb-6","children":[["$","div",null,{"className":"flex items-center mb-4","children":[["$","img",null,{"src":"https://www.gravatar.com/avatar/422d7176a99e3227d315c185a4b60f06?s=256&d=identicon&r=PG&f=y&so-version=2","alt":"David","className":"w-12 h-12 rounded-full border"}],["$","div",null,{"className":"ml-4","children":[["$","a",null,{"href":"https://stackoverflow.com/users/5827767/david","target":"_blank","rel":"noopener noreferrer","className":"text-lg font-semibold text-blue-600 hover:underline","children":"David"}],["$","p",null,{"className":"text-sm text-gray-500","children":["Reputation: ",11593]}]]}]]}],["$","p",null,{"className":"text-gray-700 mb-4","dangerouslySetInnerHTML":{"__html":"

This parquet \"file\" will actually be a directory. This answer shows how to delete a directory with files in it

\n\n

import shutil\nshutil.rmtree('/folder_name')\n

\n"}}],["$","div",null,{"className":"text-gray-600 text-sm","children":["$","p",null,{"children":["Upvotes: ",5]}]}]]}]]]}],["$","div",null,{"className":"bg-white shadow-md rounded-lg p-6 mt-6","children":[["$","h2",null,{"className":"text-2xl font-semibold text-gray-800 mb-4","children":"Related Questions"}],["$","ul",null,{"className":"list-disc list-inside","children":[["$","li","16476924",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/16476924","className":"text-blue-600 hover:underline","children":"How can I iterate over rows in a Pandas DataFrame?"}]}],["$","li","606191",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/606191","className":"text-blue-600 hover:underline","children":"Convert bytes to a string in Python 3"}]}],["$","li","13411544",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/13411544","className":"text-blue-600 hover:underline","children":"Delete a column from a Pandas DataFrame"}]}],["$","li","11277432",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/11277432","className":"text-blue-600 hover:underline","children":"How can I remove a key from a Python dictionary?"}]}],["$","li","82831",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/82831","className":"text-blue-600 hover:underline","children":"How do I check whether a file exists without exceptions?"}]}],["$","li","419163",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/419163","className":"text-blue-600 hover:underline","children":"What does if __name__ == "__main__": do?"}]}],["$","li","990754",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/990754","className":"text-blue-600 hover:underline","children":"How to leave/exit/deactivate a Python virtualenv"}]}],["$","li","6996603",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/6996603","className":"text-blue-600 hover:underline","children":"How can I delete a file or folder in Python?"}]}],["$","li","5844672",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/5844672","className":"text-blue-600 hover:underline","children":"Delete an element from a dictionary"}]}],["$","li","2052390",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/2052390","className":"text-blue-600 hover:underline","children":"Manually raising (throwing) an exception in Python"}]}]]}]]}]]}],["$","$L11",null,{}],["$","$L12",null,{}],["$","$L13",null,{}],["$","$L14",null,{}],["$","$L15",null,{}]]

How to delete a Parquet file on Spark?

Answers (2)

Related Questions