Reputation: 676
I have two lists: list message
and list keyword
. List message
looks like this:
message = ["my name is blabla",'x-men is a good movie','i deny that fact']
keyword = ['x-men','name is','psycho movie']
I want to make a new list which contains keywords that are present in the message.
newList = []
for message_index in message:
print(newList)
for keyword in keywords:
if search(r'\b{}\b'.format(keyword), message_index):
newList.append(keyword)
My python code is above, the problem is each sentence in my message list is around 100 to 150 words and the length of the list is 3000. Each keyword maybe one or two words and the length of the list is 12,000.
So the search is taking a long time, is there a shorter way to do it?
This question is different because of the large amount of data in both list.
Upvotes: 0
Views: 1363
Reputation: 103824
You can significantly reduce the complexity of your keyword search by joining the list message
into a delimited string and then searching for each keyword in that string:
>>> ms='\t'.join(message)
>>> [e for e in keyword if e in ms]
['x-men', 'name is']
The same method would work with a regex with the same benefit:
>>> [e for e in keyword if re.search(r'\b'+e+r'\b', ms)]
This reduces the complexity from O(M*N)
to O(N)
...
Upvotes: 1
Reputation: 92854
With built-in any()
function:
To search by simple occurrence:
message = ["my name is blabla",'x-men is a good movie','i deny that fact']
keyword = ['x-men','name is','psycho movie']
result = [k for k in keyword if any(k in m for m in message)]
print(result)
The output:
['x-men', 'name is']
----------
If you need to search by exact words:
import re
message = ["my name is blabla",'x-men is a good movie','i deny that fact']
keyword = ['x-men','name is','psycho movie']
result = [k for k in keyword if any(re.search(r'\b{}\b'.format(k), m) for m in message)]
Upvotes: 2
Reputation: 2256
Try using a nested list comprehension
list = [key for key in keyword for word in message if key in word]
Upvotes: 0