Reputation: 558
I have the same issue as this post. An email has an attachment that is message/rfc822 and I am trying to get the content and subject of that attachment.
My code is as follows:
with open("raw_email_message", 'rb') as message:
mime_email_content = email.message_from_binary_file(message, policy=policy.default)
for part in mime_email_content.walk():
if "attachment" not in str(part.get("Content-Disposition")):
continue
if part.get("Content-Type").startswith("message/"):
part_contents = part.as_string()
for header in part._payload[0]._headers:
if header[0] == "Subject":
filename = header[1]
else:
part_contents = part.get_payload(decode=True)
filename = part.get_filename()
part.as_string() gives too much information, whereas only the body and standard headers, such as To and FROM, are needed. I'm hoping there is a more elegant solution for getting the message and any headers. Ultimately, I need to create a text file from the attachment and save it as its own file.
Upvotes: 1
Views: 2508
Reputation: 1974
Walking by rfc822 attachments recursive way:
import email
from email.header import decode_header
def readable_header(h):
raw_header = decode_header(h)
header = []
for part, encoding in raw_header:
if type(part) == bytes:
header.append(part.decode(encoding) if encoding is not None else part.decode('ascii'))
else:
header.append(part)
return header
def on_file_found(part):
filename = readable_header(part.get_filename())
part_contents = part.get_payload(decode=True)
print('Attached file', filename, len(part_contents), 'bytes')
def on_message_found(content):
print('Subject:', readable_header(content['Subject']))
print('From:', readable_header(content['From']))
print('To:', readable_header(content['To']))
for part in content.walk():
if "attachment" in str(part.get("Content-Disposition")):
on_file_found(part)
if part.get("Content-Type") == "message/rfc822":
for payload in part.get_payload():
on_message_found(email.message_from_bytes(payload.as_bytes()))
with open("test.txt", 'rb') as message:
on_message_found(email.message_from_binary_file(message))
Method readable_header
returns a list because some headed have a multiple field.
Upvotes: 1
Reputation: 558
I realized the best way to handle this situation, and maybe the only way, is to treat the attachment just like the original message and call walk() again, like this:
for part in self.mime_email_content.walk():
if "attachment" not in str(part.get("Content-Disposition")):
continue
if part.get("Content-Type").startswith("message/"):
for item in part.walk():
(do work here)
Upvotes: 1