Regular expression finding '
'

Question

I'm in the process of making a program to pattern match phone numbers in text.

I'm loading this text:

(01111-222222)fdf
01111222222
(01111)222222
01111 222222
01111.222222

Into a variable, and using "findall" it's returning this:

('(01111-222222)', '(01111', '-', '222222)')
('\n011112', '', '\n', '011112')
('(01111)222222', '(01111)', '', '222222')
('01111 222222', '01111', ' ', '222222')
('01111.222222', '01111', '.', '222222')

This is my expression:

ex = re.compile(r"""(
    ($?0\d{4}$?)?       # Area code
    (\s*\-*\.*)?          # seperator
    ($?\d{6}$?)        # Local number
     )""", re.VERBOSE)

I don't understand why the '\n' is being caught.

If * in '\.*' is substituted for by '+', the expression works as I want it. Or if I simply remove *(and being happy to find the two sets of numbers separated by only a single period), the expression works.

Wiktor Stribiżew · Accepted Answer

The \s matches both horizontal and veritcal whitespace symbols. If you have a re.VERBOSE, you can match a normal space with an escaped space \ . Or, you may exclude \r and \n from \s with [^\S\r\n] to match horizontal whitespace.

Use

ex = re.compile(r"""(
    ($?0\d{4}$?)?       # Area code
    ([^\S\r\n]*-*\.*)?   # seperator   ((HERE))
    ($?\d{6}$?)        # Local number
     )""", re.VERBOSE)

See the regex demo

Also, the - outside a character class does not require escaping.

Regular expression finding '\n'

Answers (1)

Related Questions

Regular expression finding &#39;\n&#39;

Answers (1)

Related Questions

Regular expression finding '\n'