Ankita Kukreja
Ankita Kukreja

Reputation: 21

Saving RDD as textfile gives FileAlreadyExists Exception. How to create new file every time program loads and delete old one using FileUtils

Code:

val badData:RDD[ListBuffer[String]] = rdd.filter(line => line(1).equals("XX") || line(5).equals("XX"))
badData.coalesce(1).saveAsTextFile(propForFile.getString("badDataFilePath"))

First time program runs fine. On running again it throws exception for file AlreadyExists. I want to resolve this using FileUtils java functionalities and save rdd as a text file.

Upvotes: 0

Views: 970

Answers (3)

Pyd
Pyd

Reputation: 6159

import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path

val fs = spark.SparkContext.hadoopCofigurations
if (fs.exists(new Path(path/to/the/files)))
    fs.delete(new Path(path/to/the/files), true)

Pass the file name as String to the method, if directory or files present it will delete. Use this piece of code before writing it to the output path.

Upvotes: 1

Dasarathy D R
Dasarathy D R

Reputation: 345

Before you write the file to a specified path, delete the already existing path.

val fs = FileSystem.get(sc.hadoopConfiguration)
fs.delete(new Path(bad/data/file/path), true)

Then perform your usual write process. Hope this should resolve the problem.

Upvotes: 1

uh_big_mike_boi
uh_big_mike_boi

Reputation: 3470

Why not use DataFrames? Get the RDD[ListBuffer[String] into an RDD[Row] - something like -

import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types.{DoubleType, StringType, StructField, StructType}
val badData:RDD[ListBuffer[String]] = rdd.map(line => 
  Row(line(0), line(1)... line(n))
 .filter(row => filter stuff)
badData.toDF().write.mode(SaveMode.Overwrite)

Upvotes: 0

Related Questions