Reputation: 493
Alright so I load the email in from gmail with imaplib and then when I'm trying to parse the email it does not separate anything out in a usable format. I suspect this is because somewhere in the process '<' or '>' are being added to the raw email.
Here is what the debugger is showing me after I have called the method:
As you can see it hasn't really parsed anything into a usable format.
Here is the code I'm using: (NOTE: the .replace('>', '')
seems to have no effect on the end result.)
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login('[email protected]', 'password')
mail.list()
mail.select('inbox')
typ, data = mail.search(None, 'ALL')
ids = data[0]
id_list = ids.split()
# get the most recent email id
latest_email_id = int( id_list[-1] )
# iterate through 15 messages in descending order starting with latest_email_id
# the '-1' dictates reverse looping order
for i in range( latest_email_id -10, latest_email_id-15, -1 ):
typ, data = mail.fetch( str(i), '(RFC822)' )
for response_part in data:
if isinstance(response_part, tuple):
msg = str(response_part[1]).replace('<', '')
msg = msg.replace('>', '')
msg = email.message_from_string(msg)
#msg = feedparser.parse(response_part[1])
varSubject = msg['subject']
varFrom = msg['from']
python email.message_from_string() parse problems and Parsing email with Python both had very similar and identical problems to me (I think) and they solved it by altering their email, however I'm reading my email straight from Google's servers so I'm not sure exactly what to do to the email to fix it up since removing all '<' and '>' obviously won't work.
So, how do I fix the email that is read from imaplib so that it can be easily read with email.message_from_string()? (Or any other improvements/possible solutions as I'm not 100% certain the '<' and '>' are actually the problem, I'm only guessing based off of those other questions asked.)
Cheers
Upvotes: 0
Views: 3040
Reputation: 1307
You shouldn't parse <
, >
and data between them - it is like parsing HTML, but much more complicated. There are existing solutions to do it.
Here is my code that can read mail with attachments, extract data that can be used for further use and process it to human and code readable format. As you can see, all tasks are being made by third-party modules.
from datetime import datetime
import imaplib
import email
import html2text
from os import path
class MailClient(object):
def __init__(self):
self.m = imaplib.IMAP4_SSL('your.server.com')
self.Login()
def Login(self):
result, data = self.m.login('[email protected]', 'p@s$w0rd')
if result != 'OK':
raise Exception("Error connecting to mailbox: {}".format(data))
def ReadLatest(self, delete = True):
result, data = self.m.select("inbox")
if result != 'OK':
raise Exception("Error reading inbox: {}".format(data))
if data == ['0']:
return None
latest = data[0].split()[-1]
result, data = self.m.fetch(latest, "(RFC822)")
if result != 'OK':
raise Exception("Error reading email: {}".format(data))
if delete:
self.m.store(latest, '+FLAGS', '\\Deleted')
message = email.message_from_string(data[0][1])
res = {
'From' : email.utils.parseaddr(message['From'])[1],
'From name' : email.utils.parseaddr(message['From'])[0],
'Time' : datetime.fromtimestamp(email.utils.mktime_tz(email.utils.parsedate_tz(message['Date']))),
'To' : message['To'],
'Subject' : email.Header.decode_header(message["Subject"])[0][0],
'Text' : '',
'File' : None
}
for part in message.walk():
if part.get_content_maintype() == 'multipart':
continue
if part.get_content_maintype() == 'text':
# reading as HTML (not plain text)
_html = part.get_payload(decode = True)
res['Text'] = html2text.html2text(_html)
elif part.get_content_maintype() == 'application' and part.get_filename():
fname = path.join("your/folder", part.get_filename())
attachment = open(fname, 'wb')
attachment.write(part.get_payload(decode = True))
attachment.close()
if res['File']:
res['File'].append(fname)
else:
res['File'] = [fname]
return res
def __del__(self):
self.m.close()
Upvotes: 1