Reputation: 12720
A Spark RDD has the saveAsTxtFile
function. However, how I open a file and write a simple string to a hadoop store?
val sparkConf: SparkConf = new SparkConf().setAppName("example")
val sc: SparkContext = new SparkContext(sparkConf)
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "...")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "...")
val lines: RDD[String] = sc.textFile("s3n://your-output-bucket/lines.txt")
val lengths: RDD[Int] = lines.map(_.length)
lengths.saveAsTextFile("s3n://your-output-bucket/lenths.txt")
val numLines: Long = lines.count
val resultString: String = s"numLines: $numLines"
// how to save resultString to "s3n://your-output-bucket/result.txt"
sc.stop()
Upvotes: 0
Views: 1023
Reputation: 12998
Assuming you have a SparkContext
bound to sc
:
import java.io.{BufferedWriter, OutputStreamWriter}
val hdfs = org.apache.hadoop.fs.FileSystem.get(sc.hadoopConfiguration)
val outputPath =
new org.apache.hadoop.fs.Path("hdfs://localhost:9000//tmp/hello.txt")
val overwrite = true
val bw =
new BufferedWriter(new OutputStreamWriter(hdfs.create(outputPath, overwrite)))
bw.write("Hello, world")
bw.close()
Notes: To keep it simple there is no code to close the writer in case of an exception.
Upvotes: 1
Reputation: 2864
Why not do the following?
val strings = sc.parallelize(Seq("hello", "there"), <numPartitions>)
strings.saveAsTextFile("<path-to-file>")
Else you may need to look at the hadoop API to write a file and call that code explicitly from your driver.
Upvotes: 1