mattst
mattst

Reputation: 13970

Python regex to match integers but not floats

I need a Python regular expression to match integers but not floats from a string input.

The following regex uses a negative lookahead and a negative lookbehind to make sure that a number is neither preceded nor followed by a '.'.

(?<!\.)[0-9]+(?!\.)

It works only for single digit floats. e.g.

int_regex = re.compile("(?<!\.)[0-9]+(?!\.)")
str_int_list = int_regex.findall(text)

Correct when no more than 1 digit on each side of a float:
"1 + 2 + 3.0 + .4 + 5. + 66 + 777" --> ['1', '2', '66', '777']

Incorrectly matches the '1' of '12.3' and the '5' of '.45':
"12.3 + .45 + 678" --> ['1', '5', '678']

The problem appears to be that the [0-9]+ in the middle of the regex is not greedy enough.

I tried adding number matches to the lookahead and lookbehind but ran into the 'lookbehinds need to be a constant-length' in Python error.

Any suggestions as to how to match only whole integers and no floats at all would be really appreciated.

Upvotes: 2

Views: 3398

Answers (2)

ErikR
ErikR

Reputation: 52039

Simply add \d to the lookahead and lookbehind patterns:

import re

int_regex = re.compile("(?<!\.)[0-9]+(?!\.)")
re2 = re.compile("(?<![\.\d])[0-9]+(?![\.\d])")

text = "1 + 2 + 3.0 + .4 + 5. - .45 + 66 + 777 - 12.3"
print "int_regex:", int_regex.findall(text)
print "re2      :", re2.findall(text)

int_regex: ['1', '2', '5', '66', '777', '1']
re2      : ['1', '2', '66', '777']

The lookahead/behind patterns define a number boundary (much like \b defines a word boundary) and the only thing you are allowing in the number is digits.

Upvotes: 2

Aran-Fey
Aran-Fey

Reputation: 43166

Since the negative lookbehind and lookahead won't allow dots, the regex engine simply backtracks by one digit as soon as it does encounter a dot, causing the regex to match only a part of a number.

To prevent this, add digits to the lookarounds:

(?<![\d.])[0-9]+(?![\d.])

or use boundaries \b:

(?<!\.)\b[0-9]+\b(?!\.)

Upvotes: 7

Related Questions