MarcelD
MarcelD

Reputation: 115

python regex to substitute all digits except when they are part of a substring

I want to remove all digits, except if the digits make up one of the special substrings. In the example below, my special substring that should skip the digit removal are 1s, 2s, s4, 3s. I think I need to use a negative lookahead

s = "a61s8sa92s3s3as4s4af3s"
pattern = r"(?!1s|2s|s4|3s)[0-9\.]"
re.sub(pattern, ' ', s)

To my understanding, the pattern above is:

It all makes sense until you try it. The sample s above returns a 1s sa 2s3s as s af3s, which suggests that all the exclusion patterns are working except if the digit is at the end of the special substring, in which case it still gets matched?!

I believe this operation should return a 1s sa 2s3s as4s4af3s, how to fix my pattern?

Upvotes: 3

Views: 62

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195438

Try (regex101):

import re

s = "a61s8sa92s3s3as4s4af3s"

s = re.sub(r"(?!1s|2s|3s)(?<!s(?=4))[\d.]", " ", s)
print(s)

Prints:

a 1s sa 2s3s as4s4af3s

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626870

You can use

import re
s = "a61s8sa92s3s3as4s4af3s"
pattern = r"(1s|2s|s4|3s)|[\d.]"
print( re.sub(pattern, lambda x: x.group(1) or ' ', s) )
# => a 1s sa 2s3s as4s4af3s

See the Python demo.

Details:

  • (1s|2s|s4|3s) - Group 1: 1s, 2s, s4 or 3s
  • | - or
  • [\d.] - a digit or dot.

If Group 1 matches, Group 1 value is the replacement, else, it is a space.

Upvotes: 2

Related Questions