Reputation: 629
I currently have a string similar to the following:
str = 'abcHello Wor=A9ld'
What I want to do is find the 'abc' and '=A9' and replace these matched groups with an empty string, such that my final string is 'Hello World'.
I am currently using this regex, which is correctly finding the groups I want to replace:
r'^(abc).*?(=[A-Z0-9]+)'
I have tried to replace these groups using the following code:
clean_str = re.sub(r'^(abc).*?(=[A-Z0-9]+)', '', str)
Using the above code has resulted in:
print(clean_str)
>>> 'ld'
My question is, how can I use re.sub to replace these groups with an empty string and obtain my 'Hello World'?
Upvotes: 5
Views: 6910
Reputation: 627022
Is there a way that I can .. ensure that
abc
is present, otherwise don't replace the second pattern?
I understand that you need to first check if the string starts with abc
, and if yes, remove the abc
and all instances of =[0-9A-Z]+
pattern in the string.
I recommend:
import re
s="abcHello wo=A9rld"
if s.startswith('abc'):
print(re.sub(r'=[A-Z0-9]+', '', s[3:]))
Here, if s.startswith('abc'):
checks if the string has abc
in the beginning, then s[3:]
truncates the string from the start removing the abc
, and then re.sub
removes all non-overlapping instances of the =[A-Z0-9]+
pattern.
Note you may use PyPi regex
module to do the same with one regex:
import regex
r = regex.compile(r'^abc|(?<=^abc.*?)=[A-Z0-9]+', regex.S)
print(r.sub('', 'abcHello Wor=A9ld=B56')) # Hello World
print(r.sub('', 'Hello Wor=A9ld')) # => Hello Wor=A9ld
See an online Python demo
Here,
^abc
- abc
at the start of the string only|
- or(?<=^abc.*?)
- check if there is abc
at the start of the input and then any number of chars other than line break chars immediately to the left of the current location=[A-Z0-9]+
- a =
followed with 1+ uppercase ASCII letters/digits.Upvotes: 1
Reputation: 347
This worked for me.
re.sub(r'^(abc)(.*?)(=[A-Z0-9]+)(.*?)$', r"\2\4", str)
Upvotes: 2
Reputation: 136
This is a naïve approach but why can't you use replace
twice instead of regex, like this:
str = str.replace('abc','')
str = str.replace('=A9','')
print(str) #'Hello World'
Upvotes: 1
Reputation: 36033
Capture everything else and put those groups in the replacement, like so:
re.sub(r'^abc(.*?)=[A-Z0-9]+(.*)', r'\1\2', s)
Upvotes: 4