Reputation: 153
I have stored a spark data frame as orc-file using spark shell as follows:
jdbcDF.write.format("orc").partitionBy("ID").save("applicationsPartitioned")
I found out that the data now lives in windows\system32\applicationsPartitioned
How do I properly remove the orc-file? I could just shut down spark and remove the directory myself, but is there some meta-data stored somewhere about this directory?
Upvotes: 0
Views: 993
Reputation: 12991
You have to do it manually, however you can use hadoop file system to do it.
For example:
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
fs.delete(new Path(path), recursive)
This will make it OS and filesystem independent.
Upvotes: 1
Reputation: 23099
I think have to remove the directory manually but if you are trying to remove the directory for next output you can simply use the method mode()
to override the existing directory
jdbcDF.write.format("orc")
.mode(SaveMode.Overwrite)
.partitionBy("ID")
.save("applicationsPartitioned")
Hope this helps!
Upvotes: 1