Reputation: 143
I am running a spark job, and it kept failing with output folder already exists exceptions. I indeed removed the output folder before the job. Looks like the folder is created during the job and it confused other nodes/threads. It happens randomly but not always.
Upvotes: 0
Views: 1515
Reputation: 918
rdd.write().format("parquet").mode(SaveMode.Overwrite).save("location");
This should solve the issue of file already exists.
Upvotes: 2
Reputation: 148
If you are using a local filesystem path, then be aware that the folder gets created on all workers. So you probably have to delete it from all of them.
Upvotes: 0