Parse email and send reply using googleapiclient in Python

Question

I'm currently working on a project and I have chosen to use Gmail for sending and receiving emails. I want to be able to send an email, have a user reply to it, and parse their response. The response can be any number of lines (so something like response.split(' ')[0] won't work). It should then be able to reply directly to that email thread.

I've been following the googleapiclient tutorials, but they leave a lot to be desired. However, I've managed to read email threads using:

service.users.threads().get(userId='me', id=thread_id).execute()

where thread_id is (predictably) the ID of the email thread (which I find elsewhere). In the large dict returned by this, there is a section of base64 data which contains the content of the email. This was the only place I could find the actual data for the response. Unfortunately, I get this when it is decoded:

b'This is my response from my phone

On Sat, 28 Nov 2020, 8:40 PM , 
wrote:

> This is sent from the python script
>
'

This is all the data in the thread, however, I only want the response as there is clearly no way to split this to get only the data I need. The best I can think of is to parse out anything of the form On , , but that could lead to problems. There must be another way to extract only This is my response from my phone and no other data.

Once I get the response, I want to parse it and reply with an appropriate response based on the contents of the message. I would prefer to reply directly to the thread, rather than starting a new one. Unfortunately, all the Google documentation says is:

If you're trying to send a reply and want the email to thread, make sure that:

The Subject headers match

The References and In-Reply-To headers follow the RFC 2822 standard.

The documentation provides this code (with some minor modifications by me) for sending an email:

def create_message(sender, to, subject, message_text):
  message = MIMEText(message_text)
  message['to'] = to
  message['from'] = sender
  message['subject'] = subject
  return {'raw': base64.urlsafe_b64encode(message.as_bytes()).decode()}

Sending a reply with the same subject line is pretty straight forward (message['subject'] = same_subject_as_before), but I don't even know where to start with the References and In-Reply-To headers. How do I set these?

traal · Accepted Answer

Why is this hard?

You are trying to use e-mail for something it simply wasn't originally designed for. My impression is you want the e-mail response to contain structured data, but e-mail text lacks any well-defined structure. It also depends on which e-mail client the other user has, and whether they send HTML e-mail or not.

This is usually easy for a human to see, but difficult for a computer. Which suggests that Machine Learning might be the best strategy if you want higher reliability. Whatever solution you choose, it's not going to be 100% reliable.

E-mail can be plain text or HTML, or both.
There is no well-defined structure to separate replies from the original text. Wikipedia lists a few different "posting styles".
- In the old days when "Netiquette" was still cool, putting your reply on top ("top-posting") was considered bad practice, and new Internet users were told by old folks to avoid top-posting. Some users still reply below or interleaved with the original text.
The reply line (e.g. "On DATE, EMAIL wrote:" or "-------- Original Message --------") will be different, depending on which e-mail client is used, what language that client is set to, and the user's own preferences.

Using a text delimiter

A class of software which faces a similar problem as the one you describe is customer service applications, which allow operators to use e-mail for communication. A common strategy is to inject some unique text in your templates for outgoing e-mail. For example, Zendesk uses a text "delimiter" such as:

##- Please type your reply above this line -##

This serves two purposes; it tells users to top-post, and it provides a separator to cut out most of the irrelevant text.

If you first handle any HTML encoding, you should be able to split the message by such a text delimiter. It's not perfect, but it usually works.

Use products made by others

There are some open source options, such as:

https://github.com/zapier/email-reply-parser

And I found a commercial product, SigParser, which seems to use a machine learning model that they've trained very carefully:

https://sigparser.com/developers/extract-reply-chains-from-emails/

They also explain some of the challenges of parsing e-mail text into structured data.

Parse email and send reply using googleapiclient in Python

Answers (1)

Why is this hard?

Using a text delimiter

Use products made by others

Related Questions