blackmamba
blackmamba

Reputation: 15

Python: Extract sublist between strings containing keywords

I have a list of strings and now I want to extract all strings between two strings containing specific keywords (including those two strings).

example_list = ['test sentence', 'the sky is blue', 'it is raining outside', 'mic check', 'vacation time']
keywords = ['sky', 'check']

The result I want to achieve:

result = ['the sky is blue', 'it is raining outside', 'mic check']

So far, I couldn't figure it out myself. Maybe it is possible with two loops and using regex?

Upvotes: 0

Views: 484

Answers (4)

Lue Mar
Lue Mar

Reputation: 472

For each word, you have to check the presence in each sentence. So you'll have 2 loops.

The simplest way is to use the positions (indexes) of the sentences in the example list :

import numpy as np

example_list = ['test sentence', 'the sky is blue', 'it is raining outside', 'mic check', 'vacation time']
keywords = ['sky', 'check']

indexes=[]
for k in keywords : 
    for sentence in example_list :
        if k in sentence :
            indexes.append(example_list.index(sentence))

result = example_list[np.min(indexes):np.max(indexes)+1]
print(result)

it will return :

['the sky is blue', 'it is raining outside', 'mic check']

Upvotes: 0

tzot
tzot

Reputation: 95901

A generator solution that would work with any sequence of strings, not just a list:

def included(seq, start_text, end_text):
    do_yield = False
    for text in seq:
        if not do_yield and start_text in text:
            do_yield = True
        if do_yield:
            yield text
            if end_text in text:
                break

You can cast the result as a list, of course.

Upvotes: 0

Patrick Gorman
Patrick Gorman

Reputation: 154

It's a little bit of a more lengthy solution but here's another way to do it

found = False
s=0
c=0
for i in range(len(example_list)):
    if not found and keywords[0] in example_list[i]:
        found = True
        s = i
    elif found and keywords[1] in example_list[i]:
        c = i+1
out = example_list[s:c]

Upvotes: 0

Guy
Guy

Reputation: 50809

You can find the indices of the strings with the keywords and then slice the values list with the indices of the first and last occurrences

indices = [i for i, x in enumerate(example_list) if any(k in x for k in keywords)]
result = example_list[indices[0]:indices[-1] + 1]
# ['the sky is blue', 'it is raining outside', 'mic check']

Upvotes: 1

Related Questions