Reputation: 1184

Why can I not use re.sub to replace a group?

My goal is to find a group in a string using regex and replace it with a space.

The group I am looking to find is a group of symbols only when they fall between strings. When I use re.findall() it works exactly as expected

word = 'This##Is # A # Test#'
print(word)
re.findall(r"[a-zA-Z\s]*([\$\#\%\!\s]*)[a-zA-Z]",word)
>>> ['##', '# ', '# ', '']

But when I use re.sub(), instead of replacing the group, it replaces the entire regex.

x = re.sub(r"[a-zA-Z\s]*([\$\#\%\!\s]*)[a-zA-Z]",r' ',word)
print(x)
>>> '    #'

How can I use regular expressions to replace ONLY the group? The outcome I expect is:

'This Is A Test#'

Upvotes: 1

Answers (3)

Reputation: 43169

First, there's no need to escape every "magic" character within a character class, [$#%!\s]* is equally fine and much more readable.
Second, matching (i.e. retrieving) is different from replacing and you could use backreferences to achieve your goal.
Third, if you only want to have # at the end, you could help yourself with a much easier expression:
```
(?:[\s#](?!\Z))+
```
Which would then need to be replaced by a space, see a demo on regex101.com.

In Python this could be:
```
import re

string = "This##Is # A # Test#"
rx = re.compile(r'(?:[\s#](?!\Z))+')

new_string = rx.sub(' ', string)
print(new_string)
# This Is A Test#
```

Upvotes: 1

Reputation: 189377

The problem is that your regex matches the wrong thing entirely.

x = re.sub(r'\b[$#%!\s]+\b', ' ', word)

Upvotes: 0

Reputation: 106543

You can group the portions of the pattern you want to retain and use backreferences in your replacement string instead:

x = re.sub(r"([a-zA-Z\s]*)[\$\#\%\!\s]*([a-zA-Z])", r'\1 \2', word)

Upvotes: 0