Reputation: 186

Using iterated writing in HDFS file by using Spark/Scala

I am learning how to read and write from files in HDFS by using Spark/Scala. I am unable to write in HDFS file, the file is created, but it's empty. I don't know how to create a loop for writing in a file.

The code is:

import scala.collection.immutable.Map
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._

// Read the adult CSV file
  val logFile = "hdfs://zobbi01:9000/input/adult.csv"
  val conf = new SparkConf().setAppName("Simple Application")
  val sc = new SparkContext(conf)
  val logData = sc.textFile(logFile, 2).cache()


  //val logFile = sc.textFile("hdfs://zobbi01:9000/input/adult.csv")
  val headerAndRows = logData.map(line => line.split(",").map(_.trim))
  val header = headerAndRows.first
  val data = headerAndRows.filter(_(0) != header(0))
  val maps = data.map(splits => header.zip(splits).toMap)
  val result = maps.filter(map => map("AGE") != "23")

  result.foreach{

      result.saveAsTextFile("hdfs://zobbi01:9000/input/test2.txt")
  }

If I replace: result.foreach{println}

Then it works!

but when using the method of (saveAsTextFile), then an error message is thrown as

<console>:76: error: type mismatch;
 found   : Unit
 required: scala.collection.immutable.Map[String,String] => Unit
             result.saveAsTextFile("hdfs://zobbi01:9000/input/test2.txt")

Any help please.

Upvotes: 1

Answers (2)

Raktotpal Bordoloi

Reputation: 1057

What this does!!!

 result.foreach{
  result.saveAsTextFile("hdfs://zobbi01:9000/input/test2.txt")
 }

RDD action cannot be triggered from RDD transformations unless special conf set.

Just use result.saveAsTextFile("hdfs://zobbi01:9000/input/test2.txt") to save to HDFS.

I f you need other formats in the file to be written, change in rdd itself before writing.

Upvotes: 1

koiralo

Reputation: 23119

result.saveAsTextFile("hdfs://zobbi01:9000/input/test2.txt")

This is all what you need to do. You don't need to loop through all the rows.

Hope this helps!

Upvotes: 1

Using iterated writing in HDFS file by using Spark/Scala

Answers (2)

Related Questions