theunknownsolider
theunknownsolider

Reputation: 23

Python Threading isnt working

Setup: init.py

import threading
...other imports ...

... vars ...

for drive in drives:
    series = folder.getFolders(drive)
    for serie in series:        
        print(str(datetime.datetime.now()))
        t = threading.Thread(target=serienchecker, args=(drive, serie, blacklist,apikeyv3,language,))        
        t.start()
        t.join()

serienchecker.py

from threading import Thread
from themoviedb import *
from folderhelper import *
class serienchecker(Thread):
    ...
    def __init__(self, path,seriesname, blacklist, apikeytmdb='', language='eng'):
        ...
        self.startSearch()
    ...

    def startSearch(self):
        print("start")
        ...

Output:

2017-02-08 21:29:04.481536
start
2017-02-08 21:29:17.385611
start
2017-02-08 21:30:00.548471
start

But i want them all to be calculated at around the same time. Is there maybe even a way to Queue all the Tasks and process N amount of Threads simultaneous? [This is just a small example the Script would check several hundert folders] Want am i doing wrong?

I worked on several approaches nothing worked, please Help me

Thanks!

Edit://

def job():
while(jobs):
    tmp = jobs.pop()
    task(drive=tmp[0],serie=tmp[1])

def task(drive, serie):
    print("Serie[{0}]".format(serie))
    sc = serienchecker(drive, serie,blacklist,apikeyv3,language)
    sc.start()
    result = sc.result
    resultString=''
    for obj in result:
        resultString+=obj+"\n"
    print(resultString)

for drive in drives:
    series = folder.getFolders(drive)
    for serie in series:
        jobs.append([drive,serie])

while(jobs):
    job()

Upvotes: 0

Views: 133

Answers (2)

tdelaney
tdelaney

Reputation: 77337

As already mentioned, you need to postpone the join until after all threads have started. Consider using a ThreadPool that limits the number of concurrent threads and that can be re-implemented as a process pool if python's GIL slows processing. It does the thread start, dispatch and join for you.

import multiprocessing
import itertools
import platform

...

# helper functions for process pool
#
#     linux - worker process gets a view of parent memory at time pool
#     is created, including global variables that exist at that time.
#     
#     windows - a new process is created and all needed state must be
#     passed to the child. we could pass these values on every call,
#     but assuming blacklist is large, its more efficient to set it
#     up once

do_init = platform.system() == "Windows"

if do_init:

    def init_serienchecker_process(_blacklist, _apikeyv3, _language):
        """Call once when process pool worker created to set static config"""
        global blacklist, apikeyv3, language
        blacklist, apikeyv3, language = _blacklist, _apikeyv3, _language

# this is the worker in the child process. It is called with items iterated
# in the parent Pool.map function. In our case, the item is a (drive, serie)
# tuple. Unpack, combine w/ globals and call the real function.

def serienchecker_worker(drive_serie):
    """Calls serienchecker with global blacklist, apikeyv3, language set by
    init_serienchecker_process"""
    return serienchecker(drive_serie[0], drive_serie[1], blacklist,
        apikeyv3, language)

def drive_serie_iter(folder, drives):
    """Yields (drive, serie) tuples"""
    for drive in drives:
        for serie in series:
            yield drive, serie 


# decide the number of workers. Here I just chose a random max value,
# but your number will depend on your desired workload.

max_workers = 24 
num_items = len(drive) * len(serie)
num_workers = min(num_items, max_workers)

# setup a process pool. we need to initialize windows with the global
# variables but since linux already has a view of the globals, its 
# not needed

initializer = init_serienchecker_process if do_init else None
initargs = (blacklist, apikeyv3, language) if do_init else None
pool = multiprocessing.Pool(num_workers, initializer=initializer, 
    initargs=initargs)

# map calls serienchecker_worker in the subprocess for each (drive, serie)
# pair produced by drive_serie_iter

for result in pool.map(serienchecker_worker, drive_serie_iter(folder, drives)):
    print(result) # not sure you care what the results are

pool.join()

Upvotes: 0

Fabich
Fabich

Reputation: 3069

join() waits until the thread ends so you should not call it just after starting a thread (or you can't create a new thread until the previous one ends).
Create a list to store your threads at the beginning :

threads = []

Then add your threads to the list when you create them :

threads.append(t)

At the end of your program join all the threads

for t in threads:
    t.join()

Upvotes: 2

Related Questions