user979899
user979899

Reputation: 153

Spark remove Apache Orc file

I have stored a spark data frame as orc-file using spark shell as follows:

    jdbcDF.write.format("orc").partitionBy("ID").save("applicationsPartitioned")

I found out that the data now lives in windows\system32\applicationsPartitioned

How do I properly remove the orc-file? I could just shut down spark and remove the directory myself, but is there some meta-data stored somewhere about this directory?

Upvotes: 0

Views: 993

Answers (2)

Assaf Mendelson
Assaf Mendelson

Reputation: 12991

You have to do it manually, however you can use hadoop file system to do it.

For example:

import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
fs.delete(new Path(path), recursive)

This will make it OS and filesystem independent.

Upvotes: 1

koiralo
koiralo

Reputation: 23099

I think have to remove the directory manually but if you are trying to remove the directory for next output you can simply use the method mode() to override the existing directory

jdbcDF.write.format("orc")
.mode(SaveMode.Overwrite)
.partitionBy("ID")
.save("applicationsPartitioned")

Hope this helps!

Upvotes: 1

Related Questions