Python Regex: Backreference a matching regex group

Question

I am trying to return 2 subgroups from my regex match:

email_add = "John@Doe.com "
m = re.match(r"(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b) <(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)", email_add)

But it doesn't seem to match:

>>> m.group()
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'NoneType' object has no attribute 'group'

I suspect I probably did not group it correctly or I'm using incorrect word boundary. I tried \w instead of \b but the result is the same.

Could someone please point out my errors.

Martijn Pieters · Accepted Answer

You are matching uppercase A-Z letters only, so the character sequences ohn and oe and com cause the pattern not to match anything.

Adding the re.I case-insensitive flag makes your pattern work:

>>> import re
>>> email_add = "John@Doe.com "
>>> re.match(r"(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b) <(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)", email_add)
>>> re.match(r"(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b) <(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)", email_add, re.I)
<_sre.SRE_Match object at 0x1030d4f10>
>>> _.groups()
('John@Doe.com', 'John@Doe.com')

or you could add a-z to the character classes instead:

>>> re.match(r"(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b) <(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b)", email_add)
<_sre.SRE_Match object at 0x1030d4f10>
>>> _.groups()
('John@Doe.com', 'John@Doe.com')

Python Regex: Backreference a matching regex group

Answers (2)

Related Questions