Michael Johnson
Michael Johnson

Reputation: 470

Regular expression finding '\n'

I'm in the process of making a program to pattern match phone numbers in text.

I'm loading this text:

(01111-222222)fdf
01111222222
(01111)222222
01111 222222
01111.222222

Into a variable, and using "findall" it's returning this:

('(01111-222222)', '(01111', '-', '222222)')
('\n011112', '', '\n', '011112')
('(01111)222222', '(01111)', '', '222222')
('01111 222222', '01111', ' ', '222222')
('01111.222222', '01111', '.', '222222')

This is my expression:

ex = re.compile(r"""(
    (\(?0\d{4}\)?)?       # Area code
    (\s*\-*\.*)?          # seperator
    (\(?\d{6}\)?)        # Local number
     )""", re.VERBOSE)

I don't understand why the '\n' is being caught.

If * in '\\.*' is substituted for by '+', the expression works as I want it. Or if I simply remove *(and being happy to find the two sets of numbers separated by only a single period), the expression works.

Upvotes: 4

Views: 103

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626690

The \s matches both horizontal and veritcal whitespace symbols. If you have a re.VERBOSE, you can match a normal space with an escaped space \ . Or, you may exclude \r and \n from \s with [^\S\r\n] to match horizontal whitespace.

Use

ex = re.compile(r"""(
    (\(?0\d{4}\)?)?       # Area code
    ([^\S\r\n]*-*\.*)?   # seperator   ((HERE))
    (\(?\d{6}\)?)        # Local number
     )""", re.VERBOSE)

See the regex demo

Also, the - outside a character class does not require escaping.

Upvotes: 4

Related Questions