moh80s
moh80s

Reputation: 761

"backrefrence conditions" in python "regex" and "re" module does not work as expected

I am trying to match only North American numbers existing in a string; (123)456-7890 and 123-456-7890 are both acceptable presentation formats for North American phone numbers, meaning anyother pattern should not match.

Note: python3.7 and pycharm Editor is being used.

Here are phone numbers represented in a string:

123-456-7890 (123)456-7890 (123)-456-7890 (123-456-7890 1234567890 123 456 7890

I tried to use (\()?\d{3}(?(1)\)|-)\d{3}-\d{4} regex which indeed uses backrefrence conditionals to match the desired phone numbers, Below the python code Is included:

import regex
st = """
123-456-7890
(123)456-7890
(123)-456-7890
(123-456-7890
1234567890
123 456 7890
"""
pat = regex.compile(r'(\()?\d{3}(?(1)\)|-)\d{3}-\d{4}', regex.I)
out = pat.findall(st)
print(out)

Output using findall method: ['', '(', '']

Output using search(st).group() method which returns just the first match: 123-456-7890

Matches should be : 123-456-7890 (123)456-7890

My question is: Why does findall method which should return The matched patterns flawlessly as it does in regex 101 website, Now does return such irritating results like ['', '(', ''] ?

I have tried the regex in regex 101 website and it works perfectly, but does not here.

Note: I am using sams teach yourself regular expressions book and in page 134 The best solution for this problem is suggested and the above is it's python implementation.

Upvotes: 2

Views: 83

Answers (2)

RafalS
RafalS

Reputation: 6334

Use re.finditer: print(list(pat.finditer(st)))

Upvotes: 1

anubhava
anubhava

Reputation: 785256

Your regex is correct but if you use findall then it automatically prints all captured groups. Better to use finditer and print .group() or .group(0):

>>> pat = regex.compile(r'^(\()?\d{3}(?(1)\)|-)\d{3}-\d{4}$', regex.M)
>>> for m in pat.finditer(st):
...     print (m.group())
...
123-456-7890
(123)456-7890

Upvotes: 1

Related Questions