Ahmed
Ahmed

Reputation: 2266

Python multiprocessing Pool vs multiprocessing ThreadPool

I have a list of image paths that I want to divide between processes OR threads so that each process processes some part of the list. Processing includes loading image from the disk, do some computation and return the result. I'm using Python 2.7 multiprocessing.Pool

Here's how I create worker processes

def ProcessParallel(classifier,path):
    files=glob.glob(path+"\*.png")
    files_sorted=sorted(files,key=lambda file_name:int(file_name.split('--')[1]))
    p = multiprocessing.Pool(processes=4,initializer=Initializer,initargs=(classifier,))
    data=p.map(LoadAndClassify, files_sorted)
    return data

The issue I'm facing it that when I log initialization time in my Intializer function, I came to know that Workers aren't initialized in parallel , rather each worker is initialized with a gap of 5 seconds , Here are the logs for reference

2016-08-08 12:38:32,043 - custom_logging - INFO - Worker started
2016-08-08 12:38:37,647 - custom_logging - INFO - Worker started
2016-08-08 12:38:43,187 - custom_logging - INFO - Worker started
2016-08-08 12:38:48,634 - custom_logging - INFO - Worker started 

I've tried using multiprocessing.pool.ThreadPool instead which starts Workers at the same time.
I know how multiprocessing on Windows work and we have to place a main guard to protect our code from spawning infinite processes. The issue in my case is that I've hosted my script on IIS using FASTCGI and my script isn't main , It's being run by the FastCGI process (There's a wfastcgi.py script which is responsible for that). Now there is a main guard inside wfastcgi.py and the logs indicate that I'm not creating infinite no of processes.

Now I want to know that what exactly is the reason behind multiprocessing Pool not creating worker threads simultaneously, I'll really appreciate any help.

EDIT 1: Here's my Initializer function

def Initializer(classifier):
    global indexing_classifier
    logger.info('Worker started')
    indexing_classifier=classifier

Upvotes: 5

Views: 1750

Answers (1)

Brett Coover
Brett Coover

Reputation: 1

I had many issues trying to run multiprocessing under cgi/wsgi, it works fine locally, but not on real webservers... Ultimately it just isn't compatible. If you need to do multiprocessing, then send async jobs off to something like Celery.

Upvotes: 0

Related Questions