Reputation: 225
Say hello to String S,
s = "X Hello C there. I am B a String. Y I C am a A good string."
What I want to do:
Remove Content from X to C. (Done.)
Remove Content from C to B or A. (Note how C is repeated twice.)
Now, I'm able to remove the content from X to C using:
re.sub('X.*?C','', s, flags=re.DOTALL)
How do I go around removing C to B/Y/A? Would I need to iterate over a list or would regex be able to do it?
Upvotes: 2
Views: 67
Reputation: 626845
To remove text from X
till the first occurrence of C
and then any text up to the first occurrence of B
or Y
or A
(keeping them in the resulting string), you may use
X.*?C.*?(B|Y|A)
and replace with \1
backreference. See the regex demo. To match across lines, use re.DOTALL
flag to make .
match line break chars.
Details:
X
- matches X
.*?
- lazily matches any 0+ chars as few as possible up to the first...C
- C
.*?
- lazily matches any 0+ chars as few as possible up to the first... (B|Y|A)
- (Group 1) either B
, Y
, or A
.The \1
backreference will put back the value inside Group 1.
Python demo (pay attention at the raw string literal when defining the replacement pattern with the backreference):
import re
rx = r"X.*?C.*?(B|Y|A)"
s = "X Hello C there. I am B a String. Y I C am a A good string."
print(re.sub(rx, r"\1", s))
Upvotes: 2