Reputation: 15
I have a list of strings and now I want to extract all strings between two strings containing specific keywords (including those two strings).
example_list = ['test sentence', 'the sky is blue', 'it is raining outside', 'mic check', 'vacation time']
keywords = ['sky', 'check']
The result I want to achieve:
result = ['the sky is blue', 'it is raining outside', 'mic check']
So far, I couldn't figure it out myself. Maybe it is possible with two loops and using regex?
Upvotes: 0
Views: 484
Reputation: 472
For each word, you have to check the presence in each sentence. So you'll have 2 loops.
The simplest way is to use the positions (indexes) of the sentences in the example list :
import numpy as np
example_list = ['test sentence', 'the sky is blue', 'it is raining outside', 'mic check', 'vacation time']
keywords = ['sky', 'check']
indexes=[]
for k in keywords :
for sentence in example_list :
if k in sentence :
indexes.append(example_list.index(sentence))
result = example_list[np.min(indexes):np.max(indexes)+1]
print(result)
it will return :
['the sky is blue', 'it is raining outside', 'mic check']
Upvotes: 0
Reputation: 95901
A generator solution that would work with any sequence of strings, not just a list:
def included(seq, start_text, end_text):
do_yield = False
for text in seq:
if not do_yield and start_text in text:
do_yield = True
if do_yield:
yield text
if end_text in text:
break
You can cast the result as a list, of course.
Upvotes: 0
Reputation: 154
It's a little bit of a more lengthy solution but here's another way to do it
found = False
s=0
c=0
for i in range(len(example_list)):
if not found and keywords[0] in example_list[i]:
found = True
s = i
elif found and keywords[1] in example_list[i]:
c = i+1
out = example_list[s:c]
Upvotes: 0
Reputation: 50809
You can find the indices of the strings with the keywords and then slice the values list with the indices of the first and last occurrences
indices = [i for i, x in enumerate(example_list) if any(k in x for k in keywords)]
result = example_list[indices[0]:indices[-1] + 1]
# ['the sky is blue', 'it is raining outside', 'mic check']
Upvotes: 1