Reputation: 639
I want to fetch from,subject and body of an email using message_numbers of an email. I don't want any attachments or images. Just plain text of body. Can someone please help me with some code snippet? I am stuck at this point.
This is the current code snippet I am using to get the data:
import imaplib
import config
import bs4
import email
imap = imaplib.IMAP4_SSL(config.imap_server,config.imap_port)
r, d = imap.login(config.username, config.password)
imap.select("Inbox")
r, d = imap.uid('search', None, "ALL")
message_numbers = d[0].decode('utf8').split(' ')
for msg_uid in message_numbers:
r, d = imap.uid('FETCH', msg_uid, '(RFC822)')
try:
raw_email = d[0][1].decode('utf8')
except:
raw_email = str(bs4.BeautifulSoup(d[0][1],'lxml'))
email_message = email.message_from_string(raw_email)
print(email_message) # here i need only subject,from and body in string format and i dont want attachments
Upvotes: 0
Views: 2229
Reputation: 189387
Your code implements several incorrect assumptions.
You can't assume that an IMAP message is UTF-8 clean. In fact, chances are it isn't.
You can't assume an email body is HTML. Again, chances are, it isn't. Anyway, using BeautifulSoup and LXML to pick apart the email message itself is pretty crazy; use an email
parser, not an XML parser. Python has one built in.
The Python email
library has an older version which is still supported, but the new version in 3.6+ has a feature you definitely want here - it can try to guess what you mean by "the body" when there are multiple parts. Of course, this is a heuristic only; "the body" is not well-defined in a multipart message. Perhaps see also What are the "parts" in a multipart email?
The IMAP FETCH
command will mark messages as "seen"; perhaps you want to use readonly=True
when selecting the inbox? See also Fetch an email with imaplib but do not mark it as SEEN
import imaplib
import config
# import bs4 # not used
import email
from email.policy import default # for Python 3.6+ EmailMessage support
# Use a context manager
with imaplib.IMAP4_SSL(config.imap_server,config.imap_port) as imap:
r, d = imap.login(config.username, config.password)
imap.select("Inbox", readonly=True)
r, d = imap.uid('search', None, "ALL")
message_numbers = d[0].decode('utf8').split(' ')
for msg_uid in message_numbers:
r, d = imap.uid('FETCH', msg_uid, '(RFC822)')
message = email.message_from_bytes(d[0][1], policy=default)
print("from:", message['from'])
print("subject:", message['subject'])
# Guess at "the" body part
# Maybe parse this like before if it is an HTML part?
print(message.get_body().get_content())
print()
Upvotes: 1
Reputation: 6726
from imap_tools import MailBox, A
with MailBox('imap.mail.com').login('[email protected]', 'pwd', 'INBOX') as mailbox:
for msg in mailbox.fetch(A(all=True)):
print(msg.subject)
print(msg.from_)
print(msg.text or msg.html)
https://github.com/ikvk/imap_tools
Regards, imap_tools author.
Upvotes: 3