Spark error "Output directory file already exists

Question

I executed simple sample (spark, Windows7) and get unexpected error message FileAlreadyExistsException. I cannot find the folder or file on my computer.

Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/PluralsightData/ReadMeWordCountViaApp already exists at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1191) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1168) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1168)

package main

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext._

object WordCounter {
    def main(args: Array[String]) {
        val conf = new SparkConf().setAppName("Word Counter")
        val sc = new SparkContext(conf)
        //val textFile = sc.textFile("file:///Spark/README.md")
        val textFile = sc.textFile("file:///README.md")
        val tokenizedFileData = textFile.flatMap(line=>line.split(" "))
        val countPrep = tokenizedFileData.map(word=>(word, 1))
        val counts = countPrep.reduceByKey((accumValue, newValue)=>accumValue + newValue)
        val sortedCounts = counts.sortBy(kvPair=>kvPair._2, false)
        sortedCounts.saveAsTextFile("file:///PluralsightData/ReadMeWordCountViaApp")
    }
}

Sources of the sample can be found https://github.com/constructor-igor/TechSugar/tree/master/ScalaSamples/WordCounterSample

constructor · Accepted Answer

According to comments:

Spark prefer to avoid over-writing any existing data.
Absolute path of target file allows to find result's data on local disk.

sortedCounts.saveAsTextFile("file:///C:/temp/ReadMeWordCountViaApp")

Spark error "Output directory file already exists

Answers (1)

Related Questions

Spark error &quot;Output directory file already exists

Answers (1)

Related Questions

Spark error "Output directory file already exists