kravb
kravb

Reputation: 558

Parsing a RFC822 attachment with email library

I have the same issue as this post. An email has an attachment that is message/rfc822 and I am trying to get the content and subject of that attachment.

My code is as follows:

with open("raw_email_message", 'rb') as message:
    mime_email_content  = email.message_from_binary_file(message, policy=policy.default)

    for part in mime_email_content.walk():
        if "attachment" not in str(part.get("Content-Disposition")):
            continue

        if part.get("Content-Type").startswith("message/"):
            part_contents = part.as_string()
            for header in part._payload[0]._headers:
                if header[0] == "Subject":
                    filename = header[1]
        else:
            part_contents = part.get_payload(decode=True)
            filename = part.get_filename()

part.as_string() gives too much information, whereas only the body and standard headers, such as To and FROM, are needed. I'm hoping there is a more elegant solution for getting the message and any headers. Ultimately, I need to create a text file from the attachment and save it as its own file.

Upvotes: 1

Views: 2508

Answers (2)

Stanislav Ivanov
Stanislav Ivanov

Reputation: 1974

Walking by rfc822 attachments recursive way:

import email
from email.header import decode_header


def readable_header(h):
    raw_header = decode_header(h)
    header = []
    for part, encoding in raw_header:
        if type(part) == bytes:
            header.append(part.decode(encoding) if encoding is not None else part.decode('ascii'))
        else:
            header.append(part)
    return header

def on_file_found(part):
    filename = readable_header(part.get_filename())
    part_contents = part.get_payload(decode=True)
    print('Attached file', filename, len(part_contents), 'bytes')
        
def on_message_found(content):
    print('Subject:', readable_header(content['Subject']))
    print('From:', readable_header(content['From']))
    print('To:', readable_header(content['To']))
    for part in content.walk():
        if "attachment" in str(part.get("Content-Disposition")):
            on_file_found(part)
        if part.get("Content-Type") == "message/rfc822":
            for payload in part.get_payload():
                on_message_found(email.message_from_bytes(payload.as_bytes()))

with open("test.txt", 'rb') as message:
    on_message_found(email.message_from_binary_file(message))

Method readable_header returns a list because some headed have a multiple field.

Upvotes: 1

kravb
kravb

Reputation: 558

I realized the best way to handle this situation, and maybe the only way, is to treat the attachment just like the original message and call walk() again, like this:

    for part in self.mime_email_content.walk():
        if "attachment" not in str(part.get("Content-Disposition")):
            continue

        if part.get("Content-Type").startswith("message/"):
            for item in part.walk():
                (do work here)

Upvotes: 1

Related Questions