chitra
chitra

Reputation: 11

Why I am getting a byte-string even after decoding in python-3 email message parsing?

I am using imaplib to extract email messages, and have to extract text from them.

My messages are multipart, so

typ , data = account.fetch(msg_uid , '(RFC822)')
raw_email = data[0][1]
msg = email.message_from_bytes(raw_email)
payload_msg = get_message(msg)

def get_message(message):
    '''
    This function returns an decoded body text of a message, depending on multipart\* or text\*
    :param message: message content of an email
    :return: body of email message
    '''
    body = None
    if message.is_multipart():
        print(str(message.get_content_type()) + ' is the message content type')
        for part in message.walk():
            cdispo = str(part.get('Content-Disposition'))
            if part.is_multipart():
                for subpart in part.walk():
                    cdispo = str(subpart.get('Content-Disposition'))
                    if subpart.get_content_type() == 'text/plain' and 'attachment' not in cdispo:
                        body = subpart.get_payload(decode=True)
                    elif subpart.get_content_type() == 'text/html':
                        body = subpart.get_payload(decode=True)
            elif part.get_content_type() == 'text/plain' and 'attachment' not in cdispo:
                body = part.get_payload(decode=True)
            elif part.get_content_type() == 'text/html' and 'attachment' not in cdispo:
                body = part.get_payload(decode=True)
    elif message.get_content_type() == 'text/plain':
        body = message.get_payload(decode=True)
    elif message.get_content_type() == 'text/html':
        body = message.get_payload(decode=True)
    return body

Now, if you see the above code, msg is the content which we're fetching and passing it to get_payload method, with decode = True. But when I am getting the body and check the type, it still is in bytes! why?

Isn't it supposed to be converted to string?, and the strange thing is when I am giving decode = False, it's in string format! What am I doing wrong here? I'm expecting a vice-versa situation here!

P.S : raw_email is bytes here and msg is some email.message type here!

Upvotes: 1

Views: 1392

Answers (1)

lenz
lenz

Reputation: 5817

According to the docs, the decode flag is not about text encoding, but rather about quoted-printable and base64 encoding. So it's not supposed to change the type of the return value, only its content.

Also, the docs say about the get_payload() method:

This is a legacy method. On the EmailMessage class its functionality is replaced by get_content() and iter_parts().

So you should consider using those methods instead.

Upvotes: 1

Related Questions