cookie1986
cookie1986

Reputation: 895

How to replace substring between two other substrings in python?

I have a corpus of text documents, some of which will have a sequence of substrings. The first and last substrings are consistent, and mark the beginning and the end of the parts I want to replace. But, I would also like to delete/replace all substrings that exist between these first and last positions.

origSent = 'This is the sentence I am intending to edit'

Using the above as an example, how would I go about using 'the' as the start substring, and 'intending' as the end substring, deleting both in addition to the words that exist between them to make the following:

newSent = 'This is to edit'

Upvotes: 0

Views: 625

Answers (2)

Bill Chen
Bill Chen

Reputation: 1749

I would do this:

s_list = origSent.split()
newSent = ' '.join(s_list[:s_list.index('the')] + s_list[s_list.index('intending')+1:])

Hope this helps.

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520948

You could use regex replacement here:

origSent = 'This is the sentence I am intending to edit'
newSent = re.sub(r'\bthe((?!\bthe\b).)*\bintending\b', '', origSent)
print(newSent)

This prints:

This is  to edit

The "secret sauce" in the regex pattern is the tempered dot:

((?!\bthe\b).)*

This will consume all content which does not cross over another occurrence of the word the. This prevents matching on some earlier the before intending, which we don't want to do.

Upvotes: 1

Related Questions