Jeroen
Jeroen

Reputation: 957

Regex negative lookaround with optional whitespace

I am trying to find the digits, not succeeded by certain words. I do this using regular expressions in Python3. My guess is that negative lookarounds have to be used, but I'm struggling due to optional whitespaces. See the following example:

'200 word1 some 50 foo and 5foo 30word2'

Note that in reality word1 and word2 can be replaced by a lot of different words, making it much harder to search for a positive match on these words. Therefore it would be easier to exclude the numbers succeeded by foo. The expected result is:

[200, 30]

My try:

s = '200 foo some 50 bar and 5bar 30foo
pattern = r"[0-9]+\s?(?!foo)"
re.findall(pattern, s)

Results in

['200', '50 ', '5', '3']

Upvotes: 1

Views: 143

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627292

You may use

import re
s = '200 word1 some 50 foo and 5foo 30word2'
pattern = r"\b[0-9]+(?!\s*foo|[0-9])"
print(re.findall(pattern, s))
# => ['200', '30']

See the Python demo and the regex graph:

enter image description here

Details

  • \b - a word boundary
  • [0-9]+ - 1+ ASCII digits only
  • (?!\s*foo|[0-9]) - not immediately followed with
    • \s*foo - 0+ whitespaces and foo string
    • | - or
    • [0-9] - an ASCII digit.

Upvotes: 3

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522396

You should be using the pattern \b[0-9]+(?!\s*foo\b)(?=\D), which says to find all number which are not followed by optional whitespace and the word foo.

s = '200 word1 some 50 foo and 5foo 30word2'
matches = re.findall(r'\b[0-9]+(?!\s*foo\b)(?=\D)', s)
print(matches)

This prints:

['200', '30']

Upvotes: 2

Related Questions