mpenkov
mpenkov

Reputation: 21906

Python regex negative lookbehind not failing match

I'm writing a regex to match phone numbers. One of the problems I've encountered is that some postcodes look like phone numbers. For example, in Brazil, postcodes look like this:

30.160-0131

So a simple regex will capture them as false positives:

In [63]: re.search(r"(?P<phone>\d+\.\d+-\d+)", "30.160-0131")
Out[63]: <_sre.SRE_Match at 0x102150990>

Luckily, such postcodes often come with a prefix which generally means "postcode", like this:

CEP 30.160-0131

So, if you see CEP in front of something that looks like a phone number, then it's not a phone number - it's a postcode. I've been trying to write a regex to capture that using negative lookbehind, but it's not working. It still matches:

In [62]: re.search(r"(?<!CEP )(\d+\.\d+-\d+)", "CEP 30.160-0131")
Out[62]: <_sre.SRE_Match at 0x102150eb8>

Why does it still match, and how can I get the negative look-behind to fail the match?

Upvotes: 3

Views: 193

Answers (2)

Jerry
Jerry

Reputation: 71538

You can avoid negative lookaheads if you allow the matching of those postcodes, and still extract only the phone numbers:

m = re.search(r"CEP \d+\.\d+-\d+|(\d+\.\d+-\d+)", s)

And then check if you got something in m.group(1) for the phone numbers.


Little demo with re.findall:

>>> import re
>>> s = "There is a CEP 30.160-0131 and a  30.160-0132 in that sentence, which repeats itself like there is a CEP 30.160-0131 and a  30.160-0132 in that sentence."
>>> m = re.findall(r"CEP \d+\.\d+-\d+|(\d+\.\d+-\d+)", s)
>>> print(m)
['', '30.160-0132', '', '30.160-0132']

And from there, you can filter out the empty strings.

Upvotes: 1

perreal
perreal

Reputation: 97938

The expression matches because you are not doing anything to anchor the number. For example:

"CEP 11.213-132"

will match 1.213-132 since it does not immediately follow CEP. But you can force a whitespace, or start of line anchor, to be right before the first digit:

re.search(r"(?<!CEP)(?:\s+|^)(\d+\.\d+-\d+)", s)

Upvotes: 3

Related Questions