Taranjeet Singh
Taranjeet Singh

Reputation: 337

Parse a comma separated list of emails in Python which are of the format "Name" <email>

Input (comma separated list):

"\"Mr ABC\" <[email protected]>, \"Foo, Bar\" <[email protected]>, [email protected]"

Expected output (list of 2-tuples):

[("Mr ABC", "[email protected]"), ("Foo, Bar", "[email protected]"), ("", "[email protected]")]

I could actually use comma splitting and then use email.utils.parseaddr(address) until I realized that the name part can also have comma in it, like in "Foo, Bar" above.

email.utils.getaddresses(fieldvalues) is very close to what I need but it accepts a sequence, not a comma separated string.

Upvotes: 0

Views: 982

Answers (2)

Cyril N.
Cyril N.

Reputation: 39889

Please use getaddresses for that:

emails = getaddresses('"Mr ABC" <[email protected]>, "Foo, Bar" <[email protected]>, "[email protected]"')

=> [('Mr ABC', '[email protected]'), ('Foo, Bar', '[email protected]'), ('', '[email protected]')]

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

You may use the following

import re
p = re.compile(r'"([^"]+)"(?:\s+<([^<>]+)>)?')
test_str = '"Mr ABC" <[email protected]>, "Foo, Bar" <[email protected]>, "[email protected]"'
print(re.findall(p, test_str))

Output: [('Mr ABC', '[email protected]'), ('Foo, Bar', '[email protected]'), ('[email protected]', '')]

See IDEONE demo

The regex matches...

  • " - a double quote
  • ([^"]+) - (Group 1) 1 or more characters other than a double quote
  • " - a double quote

Then, an optional non-capturing group is introduced with (?:...)? construct: (?:\s+<([^<>]+)>)?. It matches...

  • \s+ - 1 or more whitespace characters
  • < - an opening angle bracket
  • ([^<>]+) - (Group 2) 1 or more characters other than opening or closing angle brackets
  • > - a closing angle bracket

The re.findall function gets all capture groups into a list of tuples:

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

UPDATE:

In case you need to make sure the email is the second element in the tuple, use this code (see demo):

lst = re.findall(p, test_str)
print([(tpl[1], tpl[0]) if not tpl[1] else tpl for tpl in lst])
# => [('Mr ABC', '[email protected]'), ('Foo, Bar', '[email protected]'), ('', '[email protected]')]

Upvotes: 4

Related Questions