Kaung Myat
Kaung Myat

Reputation: 107

How do I match list of sentences with a list of keywords

aI want to make a new list that matches from a list of sentences against a list of keywords.

list = ['This sentence contains disclosure.', 'This sentence contains none declared.', 'This sentence contains competing interest.', 'This sentence contains authors declare.']
keywords = ['disclosure ', 'none declared', 'interest']

The new list should print should come out like this

matched_list = ['This sentence contains disclosure.', 'This sentence contains none declared.']

I have tried using

r = re.compile('.*disclosure')
newlist = list(filter(r.match, list))

However I have a very large list of keywords and it will be impossible to type it all out in the r = re.compile('.*keywords'). Is there any other way to to match a list of sentences with a list of keywords.

Upvotes: 4

Views: 1588

Answers (1)

cs95
cs95

Reputation: 402263

You will have to check each string against the keyword list. Use a list comprehension, assuming simple string matching is enough (without the need for regex).

matched_list = [
    string for string in lst if any(
        keyword in string for keyword in keywords)]

Which is really just a concise way of saying:

matched_list = []
for string in lst:
    if any(keyword in string for keyword in keywords):
        matched_list.append(string)

any will short circuit, returning True for the first keyword that matches (or else returns False if no match is found).


If you want to use regex, you can precompile your pattern and then call pattern.search inside a loop, as usual:

import re
p = re.compile('|'.join(map(re.escape, keywords)))
matched_list = [string for string in lst if p.search(string)]

Upvotes: 2

Related Questions