Raslav M
Raslav M

Reputation: 133

Python: Threads stopping without any reason

I am trying to make a hash breaking application that will check all the lines of one file with all the lines in the rockyou dictionary. While with pre-hashing the rock you i got the time of checking one hash down to a few seconds its still not enough. This is why i am moving my program to multithreading. But my threads stop without rising any exceptions.

import threading
import datetime

class ThreadClass(threading.Thread):
    hash_list=0
    def file_len(fname):
        with open(fname) as f:
            for i, l in enumerate(f):
                pass
        return i + 1
    list_len=file_len("list.txt")

    def run(self):
        while ThreadClass.list_len>0:
            ThreadClass.list_len=ThreadClass.list_len-1
            print str(threading.current_thread())+":"+str(ThreadClass.list_len)
for i in range(20):
    try:
        t = ThreadClass()
        t.start()
    except:
        raise

Here is output: Afther that there is only one thread printing When i run it after some time there is only one thread reporting. Why? Thanks for all help

EDIT: One of the threads rises a key error.I dont know what that is

Upvotes: 1

Views: 2818

Answers (1)

mata
mata

Reputation: 69082

As calculating hashes is a CPU bound problems, using multithreading won't help you in cPython because of the GIL.

If anything, you need to use multiprocessing. Using a Pool, your whole code could be reduced to something like:

import multiprocessing

def calculate(line):
    # ... calculate the hash ...
    return (line, 'calculated_result')

pool = multiprocessing.Pool(multiprocessing.cpu_count())

with open('input.txt') as inputfile:
    result = pool.map(calculate, inputfile)

print(result)
# compare results

As to your problem with the threads: You're concurrently accessing ThreadClass.list_len from multiple theads. First you access it and compare it to 0. Then you access it again, decrease it and store it back, which is not thread safe And then you access it again when you print it. Between any of these operations, another thread could modify the value.

To show this, I've modified your code a little:

import threading
import datetime

lns = []
class ThreadClass(threading.Thread):
    hash_list=0
    list_len= 10000

    def run(self):
        while ThreadClass.list_len>0:
            ThreadClass.list_len=ThreadClass.list_len-1
            ln = ThreadClass.list_len        # copy for later use ...
            lns.append(ln)

threads = []
for i in range(20):
    t = ThreadClass()
    t.start()
    threads.append(t)

for t in threads:
    t.join()

print len(lns), len(set(lns)), min(lns)

When I run this 10 times, what i get is:

13473 9999 -1
10000 10000 0
10000 10000 0
12778 10002 -2
10140 10000 0
10000 10000 0
15579 10000 -1
10866 9996 0
10000 10000 0
10164 9999 -1

So sometimes it seems to run ok, but others there are a lot of values that have been added multiple times, and list_len even manages to get negative.

If you disassemble the run method, you'll see this:

>>> dis.dis(ThreadClass.run)
 11           0 SETUP_LOOP              57 (to 60)
        >>    3 LOAD_GLOBAL              0 (ThreadClass)
              6 LOAD_ATTR                1 (list_len)
              9 LOAD_CONST               1 (0)
             12 COMPARE_OP               4 (>)
             15 POP_JUMP_IF_FALSE       59

 12          18 LOAD_GLOBAL              0 (ThreadClass)
             21 LOAD_ATTR                1 (list_len)
             24 LOAD_CONST               2 (1)
             27 BINARY_SUBTRACT     
             28 LOAD_GLOBAL              0 (ThreadClass)
             31 STORE_ATTR               1 (list_len)

 13          34 LOAD_GLOBAL              0 (ThreadClass)
             37 LOAD_ATTR                1 (list_len)
             40 STORE_FAST               1 (ln)

 14          43 LOAD_GLOBAL              2 (lns)
             46 LOAD_ATTR                3 (append)
             49 LOAD_FAST                1 (ln)
             52 CALL_FUNCTION            1
             55 POP_TOP             
             56 JUMP_ABSOLUTE            3
        >>   59 POP_BLOCK           
        >>   60 LOAD_CONST               0 (None)
             63 RETURN_VALUE    

Simplified you can say, between any of these lines another thread could run and modify something. To safely access a value from multiple threads, you need to synchronize the access.

For example using threading.Lock the code could be modified like this:

class ThreadClass(threading.Thread):
    # ...
    lock = threading.Lock()

    def run(self):
        while True:
            with self.lock:
                # code accessing shared variables inside lock
                if ThreadClass.list_len <= 0:
                    return
                ThreadClass.list_len -= 1
                list_len = ThreadClass.list_len   # store for later use...
            # not accessing shared state, outside of lock

I'm not entirely sure that this is the cause of your problem, but it may be, specially if you're also reading from an input file in your run method.

Upvotes: 4

Related Questions