Pythonic way to compare a list of words against a list of sentences and print the matching line

Question

I'm currently cleaning out our database and its becoming very time consuming. The typical

for email in emails:

loop is in nowhere even close to fast enough.

For instance I am currently comparing a list of 230,000 emails to a 39,000,000 line full records list. It would take hours to match these emails to the records line they belong to and print. Does anyone have any idea how to implement threading into this query to speed it up? and athough this is super fast

strings = ("string1", "string2", "string3")
for line in file:
    if any(s in line for s in strings):
        print "yay!"

That would never print the matching line, just the needle.

Thank you in Advance

Filip Młynarski · Accepted Answer

Here's example solution using threads. This code splits your data in equal chunks and use them as arguments for compare() by amount threads that we declare.

strings = ("string1", "string2", "string3")
lines = ['some random', 'lines with string3', 'and without it',\
         '1234', 'string2', 'string1',\
         "string1", 'abcd', 'xyz']

def compare(x, thread_idx):
    print('Thread-{} started'.format(thread_idx))
    for line in x:
        if any(s in line for s in strings):
            print("We got one of strings in line: {}".format(line))
    print('Thread-{} finished'.format(thread_idx))

Threading part:

from threading import Thread

threads = []
threads_amount = 3
chunk_size = len(lines) // threads_amount

for chunk in range(len(lines) // chunk_size):
    threads.append(Thread(target=compare, args=(lines[chunk*chunk_size: (chunk+1)*chunk_size], chunk+1)))
    threads[-1].start()

for i in range(threads_amount):
    threads[i].join()

Output:

Thread-1 started
Thread-2 started
Thread-3 started
We got one of strings in line: string2
We got one of strings in line: string1
We got one of strings in line: string1
We got one of strings in line: lines with string3
Thread-2 finished
Thread-3 finished
Thread-1 finished

Pythonic way to compare a list of words against a list of sentences and print the matching line

Answers (2)

Related Questions