Reputation: 4402
I am trying to return 2 subgroups from my regex match:
email_add = "[email protected] <[email protected]>"
m = re.match(r"(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b) <(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)", email_add)
But it doesn't seem to match:
>>> m.group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
I suspect I probably did not group it correctly or I'm using incorrect word boundary. I tried \w instead of \b but the result is the same.
Could someone please point out my errors.
Upvotes: 0
Views: 615
Reputation: 142166
What's wrong with your regex has been pointed out, but you may also want to consider email.utils.parseaddr
:
>>> from email.utils import parseaddr
>>> email_add = "[email protected] <[email protected]>"
>>> parseaddr(email_add)
('', '[email protected]') # doesn't get first part, so could assume it's same as 2nd?
>>> email_add = "John Doe <[email protected]>"
>>> parseaddr(email_add)
('John Doe', '[email protected]') # does get name and email
Upvotes: 2
Reputation: 1122222
You are matching uppercase A-Z
letters only, so the character sequences ohn
and oe
and com
cause the pattern not to match anything.
Adding the re.I
case-insensitive flag makes your pattern work:
>>> import re
>>> email_add = "[email protected] <[email protected]>"
>>> re.match(r"(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b) <(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)", email_add)
>>> re.match(r"(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b) <(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)", email_add, re.I)
<_sre.SRE_Match object at 0x1030d4f10>
>>> _.groups()
('[email protected]', '[email protected]')
or you could add a-z
to the character classes instead:
>>> re.match(r"(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b) <(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b)", email_add)
<_sre.SRE_Match object at 0x1030d4f10>
>>> _.groups()
('[email protected]', '[email protected]')
Upvotes: 2