search in python for large list

I have two lists: list message and list keyword. List message looks like this:

message = ["my name is blabla",'x-men is a good movie','i deny that fact']
keyword = ['x-men','name is','psycho movie']

I want to make a new list which contains keywords that are present in the message.

newList = []
for message_index in message:
    print(newList)
    for keyword in keywords:
        if search(r'\b{}\b'.format(keyword), message_index):
            newList.append(keyword)

My python code is above, the problem is each sentence in my message list is around 100 to 150 words and the length of the list is 3000. Each keyword maybe one or two words and the length of the list is 12,000.

So the search is taking a long time, is there a shorter way to do it?

This question is different because of the large amount of data in both list.

Upvotes: 0

Views: 1363

Answers (3)

dawg
dawg

Reputation: 103824

You can significantly reduce the complexity of your keyword search by joining the list message into a delimited string and then searching for each keyword in that string:

>>> ms='\t'.join(message)
>>> [e for e in keyword if e in ms]
['x-men', 'name is']

The same method would work with a regex with the same benefit:

>>> [e for e in keyword if re.search(r'\b'+e+r'\b', ms)]

This reduces the complexity from O(M*N) to O(N)...

Upvotes: 1

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

With built-in any() function:

To search by simple occurrence:

message = ["my name is blabla",'x-men is a good movie','i deny that fact']
keyword = ['x-men','name is','psycho movie']

result = [k for k in keyword if any(k in m for m in message)]
print(result)

The output:

['x-men', 'name is']

----------

If you need to search by exact words:

import re

message = ["my name is blabla",'x-men is a good movie','i deny that fact']
keyword = ['x-men','name is','psycho movie']

result = [k for k in keyword if any(re.search(r'\b{}\b'.format(k), m) for m in message)]

Upvotes: 2

APorter1031
APorter1031

Reputation: 2256

Try using a nested list comprehension

list = [key for key in keyword for word in message if key in word]

Upvotes: 0

Related Questions