Reputation:
I'm obtaining emails using imaplib in Python/Django.
My goal is to both read plain text and HTML emails.
I'm using:
mail.select('inbox', readonly=True)
result, data = mail.uid('fetch', email_uid, '(RFC822)')
raw_email = data[0][1]
email_message = email.message_from_string(raw_email)
#print "EMAIL:",email_message
#print "HEADERS",email_message.items()
subject = get_decoded_header(email_message['Subject'])
from_address = get_decoded_header(email_message['From'])
date = email_message['Date']
date = parse_date(date)
body = ''+get_first_text_block(email_message)
And the code for get_first_text_block (obtained from the web):
def get_first_text_block(email_message_instance):
maintype = email_message_instance.get_content_maintype()
if maintype == 'multipart':
for part in email_message_instance.get_payload():
if part.get_content_maintype() == 'text':
return part.get_payload()
elif maintype == 'text':
return email_message_instance.get_payload()
# In cases of emails with empty body
return ''
Now, the problem with this is, the text doesn't appear formatted. Specifically: If it's a plain text email, the text appears as one big consolidated string instead of having breaks, paragraphs and empty lines between lines.
If it's an HTML text, the HTML doesn't show at all, instead it shows up as plaintext with fragments of HTML inside (even using the |safe filter on Django).
I suppose something like an improper conversion of the email payload to string or similar might be happening, but I checked everything and couldn't find out what could be wrong.
What am I doing wrong?
Upvotes: 5
Views: 1086
Reputation: 11
To extract the text version you can use code below. If you want the html version of the email juist replace != 'plain'
by != 'html'
.
import email
resp, data = M.FETCH(1, '(RFC822)')
mail = email.message_from_string(data[0][1])
for part in mail.walk():
print 'Content-Type:',part.get_content_type()
print 'Main Content:',part.get_content_maintype()
print 'Sub Content:',part.get_content_subtype()
for part in mail.walk():
if part.get_content_maintype() == 'multipart':
continue
if part.get_content_subtype() != 'plain':
continue
payload = part.get_payload()
print payload
Upvotes: 1
Reputation: 3760
The problem is that you are using just the first text block for email body. Try the following instead and see if it works. It's not a Django problem.
body = email_message.get_payload()[1].get_payload()
Try changing the index and test to see if you see the html.
Based on that, you have to modify the function to get the body of the email.
EDIT: I am assuming here that you are looking at multipart message
Upvotes: 2