Brian Waters
Brian Waters

Reputation: 629

Replace captured groups with empty string in python

I currently have a string similar to the following:

str = 'abcHello Wor=A9ld'

What I want to do is find the 'abc' and '=A9' and replace these matched groups with an empty string, such that my final string is 'Hello World'.

I am currently using this regex, which is correctly finding the groups I want to replace:

r'^(abc).*?(=[A-Z0-9]+)'

I have tried to replace these groups using the following code:

clean_str = re.sub(r'^(abc).*?(=[A-Z0-9]+)', '', str)

Using the above code has resulted in:

print(clean_str)
>>> 'ld'

My question is, how can I use re.sub to replace these groups with an empty string and obtain my 'Hello World'?

Upvotes: 5

Views: 6910

Answers (4)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627022

Is there a way that I can .. ensure that abc is present, otherwise don't replace the second pattern?

I understand that you need to first check if the string starts with abc, and if yes, remove the abc and all instances of =[0-9A-Z]+ pattern in the string.

I recommend:

import re
s="abcHello wo=A9rld"
if s.startswith('abc'):
    print(re.sub(r'=[A-Z0-9]+', '', s[3:]))

Here, if s.startswith('abc'): checks if the string has abc in the beginning, then s[3:] truncates the string from the start removing the abc, and then re.sub removes all non-overlapping instances of the =[A-Z0-9]+ pattern.

Note you may use PyPi regex module to do the same with one regex:

import regex
r = regex.compile(r'^abc|(?<=^abc.*?)=[A-Z0-9]+', regex.S)
print(r.sub('', 'abcHello Wor=A9ld=B56')) # Hello World
print(r.sub('', 'Hello Wor=A9ld'))        # => Hello Wor=A9ld

See an online Python demo

Here,

  • ^abc - abc at the start of the string only
  • | - or
  • (?<=^abc.*?) - check if there is abc at the start of the input and then any number of chars other than line break chars immediately to the left of the current location
  • =[A-Z0-9]+ - a = followed with 1+ uppercase ASCII letters/digits.

Upvotes: 1

Ali Yusuf
Ali Yusuf

Reputation: 347

This worked for me.

re.sub(r'^(abc)(.*?)(=[A-Z0-9]+)(.*?)$', r"\2\4", str)

Upvotes: 2

sachin_hg
sachin_hg

Reputation: 136

This is a naïve approach but why can't you use replace twice instead of regex, like this:

str = str.replace('abc','')
str = str.replace('=A9','')

print(str) #'Hello World'

Upvotes: 1

Alex Hall
Alex Hall

Reputation: 36033

Capture everything else and put those groups in the replacement, like so:

re.sub(r'^abc(.*?)=[A-Z0-9]+(.*)', r'\1\2', s)

Upvotes: 4

Related Questions