DJElbow
DJElbow

Reputation: 3463

Spark - on EMR saveAsTextFile wont write data to local dir

Running Spark on EMR (AMI 3.8). When trying to write an RDD to a local file, I am getting no results on the name/master node.

On my previous EMR cluster (same version of Spark installed with bootstrap script instead of as an add-on to EMR), the data would write to the local dir on the name node. Now I can see it appearing in "/home/hadoop/test/_temporary/0/task*" directories on the other nodes in the cluster, but only the 'SUCCESS' file on the master node.

How can I get the file to write to the name/master node only?

Here is an example of the command I am using:

myRDD.saveAsTextFile("file:///home/hadoop/test")

Upvotes: 1

Views: 1498

Answers (1)

DJElbow
DJElbow

Reputation: 3463

I can do this in a round about way using by pushing to HDFS first then writing the results to local filesystem with shell commands. But I would love to hear if others have a more elegant approach.

  //rdd to local text file
  def rddToFile(rdd: RDD[_], filePath: String) = {

    //setting up bash commands
    val createFileStr = "hadoop fs -cat " + filePath + "/part* > " + filePath
    val removeDirStr  = "hadoop fs -rm -r " + filePath

    //rm dir in case exists
    Process(Seq("bash", "-c", removeDirStr)) !

    //save data to HDFS
    rdd.saveAsTextFile(filePath)

    //write data to local file
    Process(Seq("bash", "-c", createFileStr)) !

    //rm HDFS dir
    Process(Seq("bash", "-c", removeDirStr)) !

  }

Upvotes: 0

Related Questions