Dirty Penguin
Dirty Penguin

Reputation: 4402

Python Regex: Backreference a matching regex group

I am trying to return 2 subgroups from my regex match:

email_add = "[email protected] <[email protected]>"
m = re.match(r"(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b) <(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)", email_add)

But it doesn't seem to match:

>>> m.group()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

I suspect I probably did not group it correctly or I'm using incorrect word boundary. I tried \w instead of \b but the result is the same.

Could someone please point out my errors.

Upvotes: 0

Views: 615

Answers (2)

Jon Clements
Jon Clements

Reputation: 142166

What's wrong with your regex has been pointed out, but you may also want to consider email.utils.parseaddr:

>>> from email.utils import parseaddr
>>> email_add = "[email protected] <[email protected]>"
>>> parseaddr(email_add)
('', '[email protected]')  # doesn't get first part, so could assume it's same as 2nd?
>>> email_add = "John Doe <[email protected]>"
>>> parseaddr(email_add)
('John Doe', '[email protected]') # does get name and email

Upvotes: 2

Martijn Pieters
Martijn Pieters

Reputation: 1122222

You are matching uppercase A-Z letters only, so the character sequences ohn and oe and com cause the pattern not to match anything.

Adding the re.I case-insensitive flag makes your pattern work:

>>> import re
>>> email_add = "[email protected] <[email protected]>"
>>> re.match(r"(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b) <(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)", email_add)
>>> re.match(r"(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b) <(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)", email_add, re.I)
<_sre.SRE_Match object at 0x1030d4f10>
>>> _.groups()
('[email protected]', '[email protected]')

or you could add a-z to the character classes instead:

>>> re.match(r"(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b) <(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b)", email_add)
<_sre.SRE_Match object at 0x1030d4f10>
>>> _.groups()
('[email protected]', '[email protected]')

Upvotes: 2

Related Questions