BugggyBug
BugggyBug

Reputation: 43

How to get a readable email from AWS S3 after it's stored there as an object?

I've set up SES to receive emails on my domain and then store the emails to S3. I trigger an SNS notification when a new email has arrived which triggers a lambda to do processing with the contents inside the email. Everything works as expected however, I'm not able to get any sensible data out of the emails I fetch from S3. For instance, getting an object from S3 of the email gives me this data :

 <div dir=3D"ltr">ssadsadasdasdas</div><br><div class=3D"gmail_quote"><div d=
ir=3D"ltr" class=3D"gmail_attr">On Tue, Nov 5, 2019 at 5:30 PM Rahul Patil =
&lt;<a href=3D"mailto:[email protected]">[email protected]<=
/a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0=
px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><=
div dir=3D"ltr">asdsadasdasdasd</div><br><div class=3D"gmail_quote"><div di=
r=3D"ltr" class=3D"gmail_attr">On Tue, Nov 5, 2019 at 5:27 PM &lt;<a href=
=3D"mailto:[email protected]" target=3D"_blank">[email protected]</a>&g=
t; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0p=
x 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Body<b=
r>

The code which fetches the data

const obj = await s3.getObject(getObjectParams).promise();
      console.log(obj);
      let objectData = obj.Body.toString("utf-8");
      console.log(objectData)

I don't need all that HTML, just the sender's email and the body would be sufficient. Is there an inbuilt way I can filter the required data? Any node-email-parser modules that can be plugged inside the lamba? More importantly, Am I doing it the right way? Thanks!

Upvotes: 1

Views: 1875

Answers (1)

peterh
peterh

Reputation: 19275

Yes, you need a parser.

Amazon SES will store incoming emails in S3 in RFC822 format, meaning exactly as they are received from the wire. This is by definition plain text, no matter how complex the email, even if it has attachments. Somewhere inside that RFC822 text piece there may or may not be some HTML in the body. An email's body can be plain text only, it can be HTML (most common) or it can be both.

You'll need to use a library which can parse RFC822. There are quite many of those. Which one to use will depend on your language choice. You'll also need to familiarize yourself with the anatomy of an Internet email message, i.e. RFC822. You'll find a wealth of information on that with a bit of googling. Suggestion: Your own email client can most likely save an email in RFC822 format and then you can use that as an example of what an email truly looks like in its 'native' format. Just have a look at it in your favorite text viewer.

Your question can be rephrased into an RFC822 parsing question. Some people refer to such files as .eml files. Same thing.

Happy hunting.

Upvotes: 2

Related Questions