Reputation: 186
I am learning how to read and write from files in HDFS by using Spark/Scala. I am unable to write in HDFS file, the file is created, but it's empty. I don't know how to create a loop for writing in a file.
The code is:
import scala.collection.immutable.Map
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
// Read the adult CSV file
val logFile = "hdfs://zobbi01:9000/input/adult.csv"
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
//val logFile = sc.textFile("hdfs://zobbi01:9000/input/adult.csv")
val headerAndRows = logData.map(line => line.split(",").map(_.trim))
val header = headerAndRows.first
val data = headerAndRows.filter(_(0) != header(0))
val maps = data.map(splits => header.zip(splits).toMap)
val result = maps.filter(map => map("AGE") != "23")
result.foreach{
result.saveAsTextFile("hdfs://zobbi01:9000/input/test2.txt")
}
If I replace:
result.foreach{println}
Then it works!
but when using the method of (saveAsTextFile), then an error message is thrown as
<console>:76: error: type mismatch;
found : Unit
required: scala.collection.immutable.Map[String,String] => Unit
result.saveAsTextFile("hdfs://zobbi01:9000/input/test2.txt")
Any help please.
Upvotes: 1
Views: 570
Reputation: 1057
What this does!!!
result.foreach{
result.saveAsTextFile("hdfs://zobbi01:9000/input/test2.txt")
}
RDD action
cannot be triggered from RDD transformations
unless special conf set.
Just use result.saveAsTextFile("hdfs://zobbi01:9000/input/test2.txt")
to save to HDFS.
I f you need other formats in the file to be written, change in rdd itself before writing.
Upvotes: 1
Reputation: 23109
result.saveAsTextFile("hdfs://zobbi01:9000/input/test2.txt")
This is all what you need to do. You don't need to loop through all the rows.
Hope this helps!
Upvotes: 1