Reputation: 895
I have a corpus of text documents, some of which will have a sequence of substrings. The first and last substrings are consistent, and mark the beginning and the end of the parts I want to replace. But, I would also like to delete/replace all substrings that exist between these first and last positions.
origSent = 'This is the sentence I am intending to edit'
Using the above as an example, how would I go about using 'the' as the start substring, and 'intending' as the end substring, deleting both in addition to the words that exist between them to make the following:
newSent = 'This is to edit'
Upvotes: 0
Views: 625
Reputation: 1749
I would do this:
s_list = origSent.split()
newSent = ' '.join(s_list[:s_list.index('the')] + s_list[s_list.index('intending')+1:])
Hope this helps.
Upvotes: 1
Reputation: 520948
You could use regex replacement here:
origSent = 'This is the sentence I am intending to edit'
newSent = re.sub(r'\bthe((?!\bthe\b).)*\bintending\b', '', origSent)
print(newSent)
This prints:
This is to edit
The "secret sauce" in the regex pattern is the tempered dot:
((?!\bthe\b).)*
This will consume all content which does not cross over another occurrence of the word the
. This prevents matching on some earlier the
before intending
, which we don't want to do.
Upvotes: 1