Ishaan Patel
Ishaan Patel

Reputation: 225

Removing Content Between Multiple Strings

Say hello to String S,

s = "X Hello C there. I am B a String. Y I C am a A good string."

What I want to do:

Now, I'm able to remove the content from X to C using:

re.sub('X.*?C','', s, flags=re.DOTALL)

How do I go around removing C to B/Y/A? Would I need to iterate over a list or would regex be able to do it?

Expected output: (Need to remove these)

Upvotes: 2

Views: 67

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

To remove text from X till the first occurrence of C and then any text up to the first occurrence of B or Y or A (keeping them in the resulting string), you may use

X.*?C.*?(B|Y|A)

and replace with \1 backreference. See the regex demo. To match across lines, use re.DOTALL flag to make . match line break chars.

Details:

  • X - matches X
  • .*? - lazily matches any 0+ chars as few as possible up to the first...
  • C - C
  • .*? - lazily matches any 0+ chars as few as possible up to the first...
  • (B|Y|A) - (Group 1) either B, Y, or A.

The \1 backreference will put back the value inside Group 1.

Python demo (pay attention at the raw string literal when defining the replacement pattern with the backreference):

import re
rx = r"X.*?C.*?(B|Y|A)"
s = "X Hello C there. I am B a String. Y I C am a A good string."
print(re.sub(rx, r"\1", s))

Upvotes: 2

Related Questions