Reputation: 23
Setup: init.py
import threading
...other imports ...
... vars ...
for drive in drives:
series = folder.getFolders(drive)
for serie in series:
print(str(datetime.datetime.now()))
t = threading.Thread(target=serienchecker, args=(drive, serie, blacklist,apikeyv3,language,))
t.start()
t.join()
serienchecker.py
from threading import Thread
from themoviedb import *
from folderhelper import *
class serienchecker(Thread):
...
def __init__(self, path,seriesname, blacklist, apikeytmdb='', language='eng'):
...
self.startSearch()
...
def startSearch(self):
print("start")
...
Output:
2017-02-08 21:29:04.481536
start
2017-02-08 21:29:17.385611
start
2017-02-08 21:30:00.548471
start
But i want them all to be calculated at around the same time. Is there maybe even a way to Queue all the Tasks and process N amount of Threads simultaneous? [This is just a small example the Script would check several hundert folders] Want am i doing wrong?
I worked on several approaches nothing worked, please Help me
Thanks!
Edit://
def job():
while(jobs):
tmp = jobs.pop()
task(drive=tmp[0],serie=tmp[1])
def task(drive, serie):
print("Serie[{0}]".format(serie))
sc = serienchecker(drive, serie,blacklist,apikeyv3,language)
sc.start()
result = sc.result
resultString=''
for obj in result:
resultString+=obj+"\n"
print(resultString)
for drive in drives:
series = folder.getFolders(drive)
for serie in series:
jobs.append([drive,serie])
while(jobs):
job()
Upvotes: 0
Views: 133
Reputation: 77337
As already mentioned, you need to postpone the join
until after all threads have started. Consider using a ThreadPool
that limits the number of concurrent threads and that can be re-implemented as a process pool if python's GIL slows processing. It does the thread start, dispatch and join for you.
import multiprocessing
import itertools
import platform
...
# helper functions for process pool
#
# linux - worker process gets a view of parent memory at time pool
# is created, including global variables that exist at that time.
#
# windows - a new process is created and all needed state must be
# passed to the child. we could pass these values on every call,
# but assuming blacklist is large, its more efficient to set it
# up once
do_init = platform.system() == "Windows"
if do_init:
def init_serienchecker_process(_blacklist, _apikeyv3, _language):
"""Call once when process pool worker created to set static config"""
global blacklist, apikeyv3, language
blacklist, apikeyv3, language = _blacklist, _apikeyv3, _language
# this is the worker in the child process. It is called with items iterated
# in the parent Pool.map function. In our case, the item is a (drive, serie)
# tuple. Unpack, combine w/ globals and call the real function.
def serienchecker_worker(drive_serie):
"""Calls serienchecker with global blacklist, apikeyv3, language set by
init_serienchecker_process"""
return serienchecker(drive_serie[0], drive_serie[1], blacklist,
apikeyv3, language)
def drive_serie_iter(folder, drives):
"""Yields (drive, serie) tuples"""
for drive in drives:
for serie in series:
yield drive, serie
# decide the number of workers. Here I just chose a random max value,
# but your number will depend on your desired workload.
max_workers = 24
num_items = len(drive) * len(serie)
num_workers = min(num_items, max_workers)
# setup a process pool. we need to initialize windows with the global
# variables but since linux already has a view of the globals, its
# not needed
initializer = init_serienchecker_process if do_init else None
initargs = (blacklist, apikeyv3, language) if do_init else None
pool = multiprocessing.Pool(num_workers, initializer=initializer,
initargs=initargs)
# map calls serienchecker_worker in the subprocess for each (drive, serie)
# pair produced by drive_serie_iter
for result in pool.map(serienchecker_worker, drive_serie_iter(folder, drives)):
print(result) # not sure you care what the results are
pool.join()
Upvotes: 0
Reputation: 3069
join()
waits until the thread ends so you should not call it just after starting a thread (or you can't create a new thread until the previous one ends).
Create a list to store your threads at the beginning :
threads = []
Then add your threads to the list when you create them :
threads.append(t)
At the end of your program join all the threads
for t in threads:
t.join()
Upvotes: 2