Reputation: 13970
I need a Python regular expression to match integers but not floats from a string input.
The following regex uses a negative lookahead and a negative lookbehind to make sure that a number is neither preceded nor followed by a '.'.
(?<!\.)[0-9]+(?!\.)
It works only for single digit floats. e.g.
int_regex = re.compile("(?<!\.)[0-9]+(?!\.)")
str_int_list = int_regex.findall(text)
Correct when no more than 1 digit on each side of a float:
"1 + 2 + 3.0 + .4 + 5. + 66 + 777" --> ['1', '2', '66', '777']
Incorrectly matches the '1' of '12.3' and the '5' of '.45':
"12.3 + .45 + 678" --> ['1', '5', '678']
The problem appears to be that the [0-9]+
in the middle of the regex is not greedy enough.
I tried adding number matches to the lookahead and lookbehind but ran into the 'lookbehinds need to be a constant-length' in Python error.
Any suggestions as to how to match only whole integers and no floats at all would be really appreciated.
Upvotes: 2
Views: 3398
Reputation: 52039
Simply add \d
to the lookahead and lookbehind patterns:
import re
int_regex = re.compile("(?<!\.)[0-9]+(?!\.)")
re2 = re.compile("(?<![\.\d])[0-9]+(?![\.\d])")
text = "1 + 2 + 3.0 + .4 + 5. - .45 + 66 + 777 - 12.3"
print "int_regex:", int_regex.findall(text)
print "re2 :", re2.findall(text)
int_regex: ['1', '2', '5', '66', '777', '1']
re2 : ['1', '2', '66', '777']
The lookahead/behind patterns define a number boundary (much like \b
defines a word boundary) and the only thing you are allowing in the number is digits.
Upvotes: 2
Reputation: 43166
Since the negative lookbehind and lookahead won't allow dots, the regex engine simply backtracks by one digit as soon as it does encounter a dot, causing the regex to match only a part of a number.
To prevent this, add digits to the lookarounds:
(?<![\d.])[0-9]+(?![\d.])
or use boundaries \b
:
(?<!\.)\b[0-9]+\b(?!\.)
Upvotes: 7