Raman Narasimhan
Raman Narasimhan

Reputation: 188

How do I attach file from s3/Hdfs to an email in Spark-Scala?

I have to export a spark DataFrame to file (either on S3/HDFS) and then send the file as an email attachment.

What is the easiest way to do this in scala?

I tried looking at javax.activation.mail, but I am not able to figure out how to get a DataSource from a file on S3/HDFS

  messageBodyPart = new MimeBodyPart()
  val source: FileDataSource = new FileDataSource(pathToAttachment)
  messageBodyPart.setDataHandler(new DataHandler(source))
  messageBodyPart.setFileName(pathToAttachment)
  multipart.addBodyPart(messageBodyPart)

Upvotes: 0

Views: 2497

Answers (1)

soupe
soupe

Reputation: 95

You didn't give enough information(spark version, data size, attachment file type ...)

Suppose you're using the Java Mail API & Spark 1.6 & HDFS and you want to send a CSV as an attachment.

First save your DF

DF.coalesce(1).write.mode("overwrite").format("com.databricks.spark.csv").option("header", "true").save("/pathToFolder/")
val filePath = "/pathToFolder/part-xxxx"

Load from HDFS

namenode = "hdfs://..."
val hdfs = FileSystem.get(new URI(namenode), new Configuration())
val path = new Path(filePath)
val stream = hdfs.open(path)
val fileName = "mydata.csv"

Set the Attachment

messageBodyPart = new MimeBodyPart()
messageBodyPart.setDataHandler(new DataHandler(new ByteArrayDataSource(stream,"text/csv")))
messageBodyPart.setFileName(fileName)
multipart.addBodyPart(messageBodyPart)

Important: this is an example with spark 1.6 and a small dataset (because it's a question of sending a DF as an email attachment)

Upvotes: 2

Related Questions