Regex negative lookaround with optional whitespace

Question

I am trying to find the digits, not succeeded by certain words. I do this using regular expressions in Python3. My guess is that negative lookarounds have to be used, but I'm struggling due to optional whitespaces. See the following example:

'200 word1 some 50 foo and 5foo 30word2'

Note that in reality word1 and word2 can be replaced by a lot of different words, making it much harder to search for a positive match on these words. Therefore it would be easier to exclude the numbers succeeded by foo. The expected result is:

[200, 30]

My try:

s = '200 foo some 50 bar and 5bar 30foo
pattern = r"[0-9]+\s?(?!foo)"
re.findall(pattern, s)

Results in

['200', '50 ', '5', '3']

Wiktor Stribiżew · Accepted Answer

You may use

import re
s = '200 word1 some 50 foo and 5foo 30word2'
pattern = r"\b[0-9]+(?!\s*foo|[0-9])"
print(re.findall(pattern, s))
# => ['200', '30']

See the Python demo and the regex graph:

Details

\b - a word boundary
[0-9]+ - 1+ ASCII digits only
(?!\s*foo|[0-9]) - not immediately followed with
- \s*foo - 0+ whitespaces and foo string
- | - or
- [0-9] - an ASCII digit.

Regex negative lookaround with optional whitespace

Answers (2)

Related Questions