Reputation: 46411
I want to parse email addresses from a To:
email field.
Indeed, when looping on the emails in a mbox:
mbox = mailbox.mbox('test.mbox')
for m in mbox:
print m['To']
we can get things like:
[email protected], Blahblah <[email protected]>, <[email protected]>, "Hey" <[email protected]>
That should be parsed into:
[{email: "[email protected]", name: ""},
{email: "[email protected]", name: "Blahblah"},
{email: "[email protected]", name: ""},
{email: "[email protected]", name: "Hey"}]
Is there something already built-in (in mailbox
or another module) for this or nothing?
I read a few times this doc but I didn't find something relevant.
Upvotes: 3
Views: 391
Reputation: 82734
You can use email.utils.getaddresses()
for this:
>>> getaddresses(['[email protected], Blahblah <[email protected]>, <[email protected]>, "Hey" <[email protected]>'])
[('', '[email protected]'), ('Blahblah', '[email protected]'), ('', '[email protected]'), ('Hey', '[email protected]')]
(Note that the function expects a list, so you have to enclose the string in [...]
.)
Upvotes: 5
Reputation: 7474
Python provides email.Header.decode_header() for decoding header. The function decode each atom and return a list of tuples ( text, encoding ) that you still have to decode and join to get the full text.
For addresses, Python provides email.utils.getaddresses() that split addresses in a list of tuple ( display-name, address ). display-name need to be decoded too and addresses must match the RFC2822 syntax. The function getmailaddresses() does all the job.
Here's a tutorial that might help http://blog.magiksys.net/parsing-email-using-python-header
Upvotes: 0
Reputation: 46411
As pointed by @TheSpooniest, email
has a parser:
import email
s = '[email protected], Blahblah <[email protected]>, <[email protected]>, "Hey" <[email protected]>'
for em in s.split(','):
print email.utils.parseaddr(em)
gives:
('', '[email protected]')
('Blahblah', '[email protected]')
('', '[email protected]')
('Hey', '[email protected]')
Upvotes: 1
Reputation: 2873
email.parser
has the modules you're looking for. email.message
is still relevant, because the parser will return messages using this structure, so you'll be getting your header data from that. But to actually read the files in, email.parser
is the way to go.
Upvotes: 1