Reputation: 188
I have to export a spark DataFrame to file (either on S3/HDFS) and then send the file as an email attachment.
What is the easiest way to do this in scala?
I tried looking at javax.activation.mail, but I am not able to figure out how to get a DataSource from a file on S3/HDFS
messageBodyPart = new MimeBodyPart()
val source: FileDataSource = new FileDataSource(pathToAttachment)
messageBodyPart.setDataHandler(new DataHandler(source))
messageBodyPart.setFileName(pathToAttachment)
multipart.addBodyPart(messageBodyPart)
Upvotes: 0
Views: 2497
Reputation: 95
You didn't give enough information(spark version, data size, attachment file type ...)
Suppose you're using the Java Mail API & Spark 1.6 & HDFS and you want to send a CSV as an attachment.
First save your DF
DF.coalesce(1).write.mode("overwrite").format("com.databricks.spark.csv").option("header", "true").save("/pathToFolder/")
val filePath = "/pathToFolder/part-xxxx"
Load from HDFS
namenode = "hdfs://..."
val hdfs = FileSystem.get(new URI(namenode), new Configuration())
val path = new Path(filePath)
val stream = hdfs.open(path)
val fileName = "mydata.csv"
Set the Attachment
messageBodyPart = new MimeBodyPart()
messageBodyPart.setDataHandler(new DataHandler(new ByteArrayDataSource(stream,"text/csv")))
messageBodyPart.setFileName(fileName)
multipart.addBodyPart(messageBodyPart)
Important: this is an example with spark 1.6 and a small dataset (because it's a question of sending a DF as an email attachment)
Upvotes: 2