Reputation: 337
Input (comma separated list):
"\"Mr ABC\" <[email protected]>, \"Foo, Bar\" <[email protected]>, [email protected]"
Expected output (list of 2-tuples):
[("Mr ABC", "[email protected]"), ("Foo, Bar", "[email protected]"), ("", "[email protected]")]
I could actually use comma splitting and then use email.utils.parseaddr(address)
until I realized that the name part can also have comma in it, like in "Foo, Bar" above.
email.utils.getaddresses(fieldvalues)
is very close to what I need but it accepts a sequence, not a comma separated string.
Upvotes: 0
Views: 982
Reputation: 39889
Please use getaddresses
for that:
emails = getaddresses('"Mr ABC" <[email protected]>, "Foo, Bar" <[email protected]>, "[email protected]"')
=> [('Mr ABC', '[email protected]'), ('Foo, Bar', '[email protected]'), ('', '[email protected]')]
Upvotes: 1
Reputation: 627082
You may use the following
import re
p = re.compile(r'"([^"]+)"(?:\s+<([^<>]+)>)?')
test_str = '"Mr ABC" <[email protected]>, "Foo, Bar" <[email protected]>, "[email protected]"'
print(re.findall(p, test_str))
Output: [('Mr ABC', '[email protected]'), ('Foo, Bar', '[email protected]'), ('[email protected]', '')]
See IDEONE demo
The regex matches...
"
- a double quote([^"]+)
- (Group 1) 1 or more characters other than a double quote"
- a double quoteThen, an optional non-capturing group is introduced with (?:...)?
construct: (?:\s+<([^<>]+)>)?
. It matches...
\s+
- 1 or more whitespace characters<
- an opening angle bracket([^<>]+)
- (Group 2) 1 or more characters other than opening or closing angle brackets>
- a closing angle bracketThe re.findall
function gets all capture groups into a list of tuples:
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.
UPDATE:
In case you need to make sure the email is the second element in the tuple, use this code (see demo):
lst = re.findall(p, test_str)
print([(tpl[1], tpl[0]) if not tpl[1] else tpl for tpl in lst])
# => [('Mr ABC', '[email protected]'), ('Foo, Bar', '[email protected]'), ('', '[email protected]')]
Upvotes: 4