Reputation: 41
This is my code to get the body of the email:
body = []
body.append(msg['payload']['parts'])
if 'data' in body[0][0]['body']:
print("goes path 1")
body = base64.urlsafe_b64decode(
body[0][0]['body']['data'])
else
print("goes path 2")
body = base64.urlsafe_b64decode(
body[0][1]['body']['data'])
else:
# What Do I do Here?
The reason i have the if elif statements is because sometimes the body is in different places so i have to try for both of them. When run through this an email that had an attachment resulted in a key error of data not existing meaning it's probably in a different place. The json object of body is in an image linked below because it is too big to paste here. How do I get the body of the email?
https://i.sstatic.net/Ufh5E.png
Edit:
The answers given by @fullfine aren't working, they output another json object the body of which can not be decoded for some reason:
binascii.Error: Invalid base64-encoded string: number of data characters (1185) cannot be 1 more than a multiple of 4
and:
binascii.Error: Incorrect padding
An example of a json object that i got from their answer is:
{'size': 370, 'data': 'PGRpdiBkaXI9Imx0ciI-WW91IGFyZSBpbnZpdGVkIHRvIGEgWm9vbSBtZWV0aW5nIG5vdy4gPGJyPjxicj5QbGVhc2UgcmVnaXN0ZXIgdGhlIG1lZXRpbmc6IDxicj48YSBocmVmPSJodHRwczovL3pvb20udXMvbWVldGluZy9yZWdpc3Rlci90Sll1Y3VpcnJEd3NHOVh3VUZJOGVEdkQ2NEJvXzhjYUp1bUkiPmh0dHBzOi8vem9vbS51cy9tZWV0aW5nL3JlZ2lzdGVyL3RKWXVjdWlyckR3c0c5WHdVRkk4ZUR2RDY0Qm9fOGNhSnVtSTwvYT48YnI-PGJyPkFmdGVyIHJlZ2lzdGVyaW5nLCB5b3Ugd2lsbCByZWNlaXZlIGEgY29uZmlybWF0aW9uIGVtYWlsIGNvbnRhaW5pbmcgaW5mb3JtYXRpb24gYWJvdXQgam9pbmluZyB0aGUgbWVldGluZy48L2Rpdj4NCg=='}
I figured out that i had to use base64.urlsafe_b64decode to decode the body which got me b'<div dir="ltr">You are invited to a Zoom meeting now. <br><br>Please register the meeting: <br><a href="https://zoom.us/meeting/register/tJJuyhn4ndhfjrhUFI8eDvD64Bo_8caJumI">https://zoom.us/meeting/register/tJYucuirrDwsG9XwUFI8eDvD64Bo_8caJumI</a><br><br>After registering, you will receive a confirmation email containing information about joining the meeting.</div>\r\n'
How can I remove all the extra html tags while keeping the raw text?
Upvotes: 1
Views: 1768
Reputation: 1461
The structure of the response body changes depending on the message itself. You can do some test to check how they look like in the documentation of the method: users.messages.get
Get the message with the id
and define the parts
.
msg = service.users().messages().get(userId='me', id=message_id['id']).execute()
payload = msg['payload']
parts = payload.get('parts')
You can find the raw version of the body message in the snippet
, as the documentation says, it contains the short part of the message text
. It's a simple solution that returns you the message without formatting or line breaks. Furthermore, you don't have to decode the result. If it does not fit your requirements, check the next solutions.
raw_message = msg['snippet']
Add a conditional statement to check if any part
of the message has a mimeType
equal to multipart/alternative
. If it is the case, the message has an attachment and the body is inside that part. You have to get the list of subparts
inside that part
. I attach you the code:
for part in parts:
body = part.get("body")
data = body.get("data")
mimeType = part.get("mimeType")
# with attachment
if mimeType == 'multipart/alternative':
subparts = part.get('parts')
for p in subparts:
body = p.get("body")
data = body.get("data")
mimeType = p.get("mimeType")
if mimeType == 'text/plain':
body_message = base64.urlsafe_b64decode(data)
elif mimeType == 'text/html':
body_html = base64.urlsafe_b64decode(data)
# without attachment
elif mimeType == 'text/plain':
body_message = base64.urlsafe_b64decode(data)
elif mimeType == 'text/html':
body_html = base64.urlsafe_b64decode(data)
final_result = str(body_message, 'utf-8')
Use a recursive function to process the parts:
def processParts(parts):
for part in parts:
body = part.get("body")
data = body.get("data")
mimeType = part.get("mimeType")
if mimeType == 'multipart/alternative':
subparts = part.get('parts')
[body_message, body_html] = processParts(subparts)
elif mimeType == 'text/plain':
body_message = base64.urlsafe_b64decode(data)
elif mimeType == 'text/html':
body_html = base64.urlsafe_b64decode(data)
return [body_message, body_html]
[body_message, body_html] = processParts(parts)
final_result = str(body_message, 'utf-8')
I tried the code with Python 2, it was my mistake. With Python 3, as you said, you have to use base64.urlsafe_b64decode(data)
instead of base64.b64decode(data)
. I've already updated the code.
I added a simple solution that maybe fits your needs. It takes the message from the snippet
key. It is a simplified version of the body message that does not need decoding.
I also don't know how you have obtained the text/html
part with my code that does not handle that. If you want to get it, you have to add a second if statement, I updated the code so you can see it.
Finally, what you obtained using base64.urlsafe_b64decode
is a bytes
variable, to obtain the string you have to convert it using str(body_message, 'utf-8')
. It is now in the code
Upvotes: 3