Reputation: 11
I am using imaplib to extract email messages, and have to extract text from them.
My messages are multipart, so
typ , data = account.fetch(msg_uid , '(RFC822)')
raw_email = data[0][1]
msg = email.message_from_bytes(raw_email)
payload_msg = get_message(msg)
def get_message(message):
'''
This function returns an decoded body text of a message, depending on multipart\* or text\*
:param message: message content of an email
:return: body of email message
'''
body = None
if message.is_multipart():
print(str(message.get_content_type()) + ' is the message content type')
for part in message.walk():
cdispo = str(part.get('Content-Disposition'))
if part.is_multipart():
for subpart in part.walk():
cdispo = str(subpart.get('Content-Disposition'))
if subpart.get_content_type() == 'text/plain' and 'attachment' not in cdispo:
body = subpart.get_payload(decode=True)
elif subpart.get_content_type() == 'text/html':
body = subpart.get_payload(decode=True)
elif part.get_content_type() == 'text/plain' and 'attachment' not in cdispo:
body = part.get_payload(decode=True)
elif part.get_content_type() == 'text/html' and 'attachment' not in cdispo:
body = part.get_payload(decode=True)
elif message.get_content_type() == 'text/plain':
body = message.get_payload(decode=True)
elif message.get_content_type() == 'text/html':
body = message.get_payload(decode=True)
return body
Now, if you see the above code, msg is the content which we're fetching and passing it to get_payload method, with decode = True. But when I am getting the body and check the type, it still is in bytes! why?
Isn't it supposed to be converted to string?, and the strange thing is when I am giving decode = False, it's in string format! What am I doing wrong here? I'm expecting a vice-versa situation here!
P.S : raw_email is bytes here and msg is some email.message type here!
Upvotes: 1
Views: 1392
Reputation: 5817
According to the docs, the decode
flag is not about text encoding, but rather about quoted-printable and base64 encoding.
So it's not supposed to change the type of the return value, only its content.
Also, the docs say about the get_payload()
method:
This is a legacy method. On the EmailMessage class its functionality is replaced by get_content() and iter_parts().
So you should consider using those methods instead.
Upvotes: 1