How to replace substring between two other substrings in python?

Question

I have a corpus of text documents, some of which will have a sequence of substrings. The first and last substrings are consistent, and mark the beginning and the end of the parts I want to replace. But, I would also like to delete/replace all substrings that exist between these first and last positions.

origSent = 'This is the sentence I am intending to edit'

Using the above as an example, how would I go about using 'the' as the start substring, and 'intending' as the end substring, deleting both in addition to the words that exist between them to make the following:

newSent = 'This is to edit'

Tim Biegeleisen · Accepted Answer

You could use regex replacement here:

origSent = 'This is the sentence I am intending to edit'
newSent = re.sub(r'\bthe((?!\bthe\b).)*\bintending\b', '', origSent)
print(newSent)

This prints:

This is  to edit

The "secret sauce" in the regex pattern is the tempered dot:

((?!\bthe\b).)*

This will consume all content which does not cross over another occurrence of the word the. This prevents matching on some earlier the before intending, which we don't want to do.

How to replace substring between two other substrings in python?

Answers (2)

Related Questions