ai.jennetta
ai.jennetta

Reputation: 1184

Why can I not use re.sub to replace a group?

My goal is to find a group in a string using regex and replace it with a space.

The group I am looking to find is a group of symbols only when they fall between strings. When I use re.findall() it works exactly as expected

word = 'This##Is # A # Test#'
print(word)
re.findall(r"[a-zA-Z\s]*([\$\#\%\!\s]*)[a-zA-Z]",word)
>>> ['##', '# ', '# ', '']

But when I use re.sub(), instead of replacing the group, it replaces the entire regex.

x = re.sub(r"[a-zA-Z\s]*([\$\#\%\!\s]*)[a-zA-Z]",r' ',word)
print(x)
>>> '    #'

How can I use regular expressions to replace ONLY the group? The outcome I expect is:

'This Is A Test#'

Upvotes: 1

Views: 83

Answers (3)

Jan
Jan

Reputation: 43169

  1. First, there's no need to escape every "magic" character within a character class, [$#%!\s]* is equally fine and much more readable.

  2. Second, matching (i.e. retrieving) is different from replacing and you could use backreferences to achieve your goal.

  3. Third, if you only want to have # at the end, you could help yourself with a much easier expression:

    (?:[\s#](?!\Z))+
    

    Which would then need to be replaced by a space, see a demo on regex101.com.


    In Python this could be:

    import re
    
    string = "This##Is # A # Test#"
    rx = re.compile(r'(?:[\s#](?!\Z))+')
    
    new_string = rx.sub(' ', string)
    print(new_string)
    # This Is A Test#
    

Upvotes: 1

tripleee
tripleee

Reputation: 189377

The problem is that your regex matches the wrong thing entirely.

x = re.sub(r'\b[$#%!\s]+\b', ' ', word)

Upvotes: 0

blhsing
blhsing

Reputation: 106543

You can group the portions of the pattern you want to retain and use backreferences in your replacement string instead:

x = re.sub(r"([a-zA-Z\s]*)[\$\#\%\!\s]*([a-zA-Z])", r'\1 \2', word)

Upvotes: 0

Related Questions