Reputation: 115
I want to remove all digits, except if the digits make up one of the special substrings. In the example below, my special substring that should skip the digit removal are 1s, 2s, s4, 3s. I think I need to use a negative lookahead
s = "a61s8sa92s3s3as4s4af3s"
pattern = r"(?!1s|2s|s4|3s)[0-9\.]"
re.sub(pattern, ' ', s)
To my understanding, the pattern above is:
It all makes sense until you try it. The sample s
above returns a 1s sa 2s3s as s af3s
, which suggests that all the exclusion patterns are working except if the digit is at the end of the special substring, in which case it still gets matched?!
I believe this operation should return a 1s sa 2s3s as4s4af3s
, how to fix my pattern?
Upvotes: 3
Views: 62
Reputation: 195438
Try (regex101):
import re
s = "a61s8sa92s3s3as4s4af3s"
s = re.sub(r"(?!1s|2s|3s)(?<!s(?=4))[\d.]", " ", s)
print(s)
Prints:
a 1s sa 2s3s as4s4af3s
Upvotes: 0
Reputation: 626870
You can use
import re
s = "a61s8sa92s3s3as4s4af3s"
pattern = r"(1s|2s|s4|3s)|[\d.]"
print( re.sub(pattern, lambda x: x.group(1) or ' ', s) )
# => a 1s sa 2s3s as4s4af3s
See the Python demo.
Details:
(1s|2s|s4|3s)
- Group 1: 1s
, 2s
, s4
or 3s
|
- or[\d.]
- a digit or dot.If Group 1 matches, Group 1 value is the replacement, else, it is a space.
Upvotes: 2