Reputation: 470
I'm in the process of making a program to pattern match phone numbers in text.
I'm loading this text:
(01111-222222)fdf
01111222222
(01111)222222
01111 222222
01111.222222
Into a variable, and using "findall" it's returning this:
('(01111-222222)', '(01111', '-', '222222)')
('\n011112', '', '\n', '011112')
('(01111)222222', '(01111)', '', '222222')
('01111 222222', '01111', ' ', '222222')
('01111.222222', '01111', '.', '222222')
This is my expression:
ex = re.compile(r"""(
(\(?0\d{4}\)?)? # Area code
(\s*\-*\.*)? # seperator
(\(?\d{6}\)?) # Local number
)""", re.VERBOSE)
I don't understand why the '\n' is being caught.
If *
in '\\.*
' is substituted for by '+
', the expression works as I want it. Or if I simply remove *
(and being happy to find the two sets of numbers separated by only a single period), the expression works.
Upvotes: 4
Views: 103
Reputation: 626690
The \s
matches both horizontal and veritcal whitespace symbols. If you have a re.VERBOSE
, you can match a normal space with an escaped space \
. Or, you may exclude \r
and \n
from \s
with [^\S\r\n]
to match horizontal whitespace.
Use
ex = re.compile(r"""(
(\(?0\d{4}\)?)? # Area code
([^\S\r\n]*-*\.*)? # seperator ((HERE))
(\(?\d{6}\)?) # Local number
)""", re.VERBOSE)
See the regex demo
Also, the -
outside a character class does not require escaping.
Upvotes: 4