jthemovie
jthemovie

Reputation: 188

Batch Request for get in gmail API

I have a list of around 2500 mail ids and I'm stuck to only use requests library, so so far i do it this way to get mail headers

mail_ids = ['']
for mail_id in mails_ids:
    res = requests.get(
         'https://www.googleapis.com/gmail/v1/users/me/messages/{}? 
          format=metadata'.format(mail_id), headers=headers).json()
    mail_headers = res['payload']['headers']
    ...

But its very inefficient and i would rather like to POST list of Ids instead, but on their documentation https://developers.google.com/gmail/api/v1/reference/users/messages/get, i don't see BatchGet, any workaround? I'm using Flask framework Thanks a lot

Upvotes: 3

Views: 1706

Answers (1)

Utkarsh Dalal
Utkarsh Dalal

Reputation: 253

This is a bit late, but in case it helps anyone, here's the code I used to do a batch get of emails:

  1. First I get a list of relevant emails. Change the request according to your needs, I'm getting only sent emails for a certain time period:
query = "https://www.googleapis.com/gmail/v1/users/me/messages?labelIds=SENT&q=after:2020-07-25 before:2020-07-31"
response = requests.get(query, headers=header)
events = json.loads(response.content)
email_tokens = events['messages']
while 'nextPageToken' in events:
    response = requests.get(query+f"&pageToken={events['nextPageToken']}", 
                            headers=header)
    events = json.loads(response.content)
    email_tokens += events['messages']
  1. Then I'm batching a get request to get 100 emails at a time, and parsing only the json part of the email and putting it into a list called emails. Note that there's some repeated code here, so you may want to refactor it into a method. You'll have to set your access token here:
emails = []
access_token = '1234'
header = {'Authorization': 'Bearer ' + access_token}
batch_header = header.copy()
batch_header['Content-Type'] = 'multipart/mixed; boundary="email_id"'
data = ''
ctr = 0
for token_dict in email_tokens:
    data += f'--email_id\nContent-Type: application/http\n\nGET /gmail/v1/users/me/messages/{token_dict["id"]}?format=full\n\n'
    if ctr == 99:
        data += '--email_id--'
        print(data)
        r = requests.post(f"https://www.googleapis.com/batch/gmail/v1", 
                          headers=batch_header, data=data)
        bodies = r.content.decode().split('\r\n')
        for body in bodies:
            if body.startswith('{'):
                parsed_body = json.loads(body)
                emails.append(parsed_body)
        ctr = 0
        data = ''
        continue
    ctr+=1
data += '--email_id--'
r = requests.post(f"https://www.googleapis.com/batch/gmail/v1", 
                  headers=batch_header, data=data)
bodies = r.content.decode().split('\r\n')
for body in bodies:
    if body.startswith('{'):
        parsed_body = json.loads(body)
        emails.append(parsed_body)
  1. [Optional] Finally, I'm decoding the text in the email and storing only the last sent email instead of the whole thread. The regex used here splits on strings that I found were usually at the end of emails. For instance, On Tue, Jun 23, 2020, [email protected] said...:
import re
import base64
gmail_split_regex = r'On [a-zA-z]{3}, ([a-zA-z]{3}|\d{2}) ([a-zA-z]{3}|\d{2}),? \d{4}'

for email in emails:
    if 'parts' not in email['payload']:
        continue
    for part in email['payload']['parts']:
        if part['mimeType'] == 'text/plain':
            if 'uniqueBody' not in email:
                plainText = str(base64.urlsafe_b64decode(bytes(str(part['body']['data']), encoding='utf-8')))
                email['uniqueBody'] = {'content': re.split(gmail_split_regex, plainText)[0]}
        elif 'parts' in part:
            for sub_part in part['parts']:
                if sub_part['mimeType'] == 'text/plain':
                    if 'uniqueBody' not in email:
                        plainText = str(base64.urlsafe_b64decode(bytes(str(sub_part['body']['data']), encoding='utf-8')))
                        email['uniqueBody'] = {'content': re.split(gmail_split_regex, plainText)[0]}

Upvotes: 1

Related Questions