Reputation: 761
I am trying to match only North American numbers existing in a string; (123)456-7890 and 123-456-7890 are both acceptable presentation formats for North American phone numbers, meaning anyother pattern should not match.
Note: python3.7 and pycharm Editor is being used.
Here are phone numbers represented in a string:
123-456-7890
(123)456-7890
(123)-456-7890
(123-456-7890
1234567890
123 456 7890
I tried to use (\()?\d{3}(?(1)\)|-)\d{3}-\d{4}
regex which indeed uses backrefrence conditionals to match the desired phone numbers, Below the python code Is included:
import regex
st = """
123-456-7890
(123)456-7890
(123)-456-7890
(123-456-7890
1234567890
123 456 7890
"""
pat = regex.compile(r'(\()?\d{3}(?(1)\)|-)\d{3}-\d{4}', regex.I)
out = pat.findall(st)
print(out)
Output using findall method: ['', '(', '']
Output using search(st).group() method which returns just the first match: 123-456-7890
Matches should be :
123-456-7890 (123)456-7890
My question is: Why does findall method which should return The matched patterns flawlessly as it does in regex 101 website, Now does return such irritating results like ['', '(', '']
?
I have tried the regex in regex 101 website and it works perfectly, but does not here.
Note: I am using sams teach yourself regular expressions book and in page 134 The best solution for this problem is suggested and the above is it's python implementation.
Upvotes: 2
Views: 83
Reputation: 785256
Your regex is correct but if you use findall
then it automatically prints all captured groups. Better to use finditer
and print .group()
or .group(0)
:
>>> pat = regex.compile(r'^(\()?\d{3}(?(1)\)|-)\d{3}-\d{4}$', regex.M)
>>> for m in pat.finditer(st):
... print (m.group())
...
123-456-7890
(123)456-7890
Upvotes: 1