Reputation: 1410
I executed simple sample (spark, Windows7) and get unexpected error message FileAlreadyExistsException. I cannot find the folder or file on my computer.
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/PluralsightData/ReadMeWordCountViaApp already exists at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1191) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1168) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1168)
package main
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext._
object WordCounter {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Word Counter")
val sc = new SparkContext(conf)
//val textFile = sc.textFile("file:///Spark/README.md")
val textFile = sc.textFile("file:///README.md")
val tokenizedFileData = textFile.flatMap(line=>line.split(" "))
val countPrep = tokenizedFileData.map(word=>(word, 1))
val counts = countPrep.reduceByKey((accumValue, newValue)=>accumValue + newValue)
val sortedCounts = counts.sortBy(kvPair=>kvPair._2, false)
sortedCounts.saveAsTextFile("file:///PluralsightData/ReadMeWordCountViaApp")
}
}
Sources of the sample can be found https://github.com/constructor-igor/TechSugar/tree/master/ScalaSamples/WordCounterSample
Upvotes: 0
Views: 4251
Reputation: 1410
According to comments:
Spark prefer to avoid over-writing any existing data.
Absolute path of target file allows to find result's data on local disk.
sortedCounts.saveAsTextFile("file:///C:/temp/ReadMeWordCountViaApp")
Upvotes: 1