Reputation: 2266
I have a list of image paths that I want to divide between processes OR threads so that each process processes some part of the list. Processing includes loading image from the disk, do some computation and return the result. I'm using Python 2.7 multiprocessing.Pool
Here's how I create worker processes
def ProcessParallel(classifier,path):
files=glob.glob(path+"\*.png")
files_sorted=sorted(files,key=lambda file_name:int(file_name.split('--')[1]))
p = multiprocessing.Pool(processes=4,initializer=Initializer,initargs=(classifier,))
data=p.map(LoadAndClassify, files_sorted)
return data
The issue I'm facing it that when I log initialization time in my Intializer function, I came to know that Workers aren't initialized in parallel , rather each worker is initialized with a gap of 5 seconds , Here are the logs for reference
2016-08-08 12:38:32,043 - custom_logging - INFO - Worker started
2016-08-08 12:38:37,647 - custom_logging - INFO - Worker started
2016-08-08 12:38:43,187 - custom_logging - INFO - Worker started
2016-08-08 12:38:48,634 - custom_logging - INFO - Worker started
I've tried using multiprocessing.pool.ThreadPool
instead which starts Workers at the same time.
I know how multiprocessing on Windows work and we have to place a main guard
to protect our code from spawning infinite processes. The issue in my case is that I've hosted my script on IIS using FASTCGI and my script isn't main , It's being run by the FastCGI process (There's a wfastcgi.py script which is responsible for that). Now there is a main guard inside wfastcgi.py and the logs indicate that I'm not creating infinite no of processes.
Now I want to know that what exactly is the reason behind multiprocessing Pool not creating worker threads simultaneously, I'll really appreciate any help.
EDIT 1: Here's my Initializer function
def Initializer(classifier):
global indexing_classifier
logger.info('Worker started')
indexing_classifier=classifier
Upvotes: 5
Views: 1750
Reputation: 1
I had many issues trying to run multiprocessing under cgi/wsgi, it works fine locally, but not on real webservers... Ultimately it just isn't compatible. If you need to do multiprocessing, then send async jobs off to something like Celery.
Upvotes: 0