Reputation: 957
I am trying to find the digits, not succeeded by certain words. I do this using regular expressions in Python3. My guess is that negative lookarounds have to be used, but I'm struggling due to optional whitespaces. See the following example:
'200 word1 some 50 foo and 5foo 30word2'
Note that in reality word1 and word2 can be replaced by a lot of different words, making it much harder to search for a positive match on these words. Therefore it would be easier to exclude the numbers succeeded by foo
. The expected result is:
[200, 30]
My try:
s = '200 foo some 50 bar and 5bar 30foo
pattern = r"[0-9]+\s?(?!foo)"
re.findall(pattern, s)
Results in
['200', '50 ', '5', '3']
Upvotes: 1
Views: 143
Reputation: 627292
You may use
import re
s = '200 word1 some 50 foo and 5foo 30word2'
pattern = r"\b[0-9]+(?!\s*foo|[0-9])"
print(re.findall(pattern, s))
# => ['200', '30']
See the Python demo and the regex graph:
Details
\b
- a word boundary[0-9]+
- 1+ ASCII digits only(?!\s*foo|[0-9])
- not immediately followed with
\s*foo
- 0+ whitespaces and foo
string|
- or[0-9]
- an ASCII digit.Upvotes: 3
Reputation: 522396
You should be using the pattern \b[0-9]+(?!\s*foo\b)(?=\D)
, which says to find all number which are not followed by optional whitespace and the word foo
.
s = '200 word1 some 50 foo and 5foo 30word2'
matches = re.findall(r'\b[0-9]+(?!\s*foo\b)(?=\D)', s)
print(matches)
This prints:
['200', '30']
Upvotes: 2