Spark remove Apache Orc file

Question

I have stored a spark data frame as orc-file using spark shell as follows:

    jdbcDF.write.format("orc").partitionBy("ID").save("applicationsPartitioned")

I found out that the data now lives in windows\system32\applicationsPartitioned

How do I properly remove the orc-file? I could just shut down spark and remove the directory myself, but is there some meta-data stored somewhere about this directory?

Assaf Mendelson · Accepted Answer

You have to do it manually, however you can use hadoop file system to do it.

For example:

import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
fs.delete(new Path(path), recursive)

This will make it OS and filesystem independent.

Spark remove Apache Orc file

Answers (2)

Related Questions