Reputation: 4908
I'm doing some slow computations using networkx
(Network analysis library), and I'm trying to use Pool workers to make it somewhat faster. The computations are independent so should be relatively straightforward to parallelize them.
def computeInfoPooled(G,num_list):
pool=Pool(processes=4)
def f(k):
curr_stat={}
curr_stat[k]=slow_function(k,G)
return curr_stat
result = pool.map(f,num_list)
return result
Now, I ran the following in console:
computed_result=computeInfoPooled(G)
I would expect this code to create 4 processes, and call f with every item (a number) of num_list in a different process. If num_list contains more than 4 numbers (in my case it's about 300), it would just run 4 at the same time and queue the rest until one of the pooled workers is done.
What happened when I ran my code is that many python.exe where being spawned (or forked, not sure what's happening) and it seemed that it was creating infinitely many processes, so I had to unplug my machine.
Any ideas what I'm doing wrong and how I could fix it?
Upvotes: 2
Views: 241
Reputation: 366133
This is explained in the docs under Programming Guidelines for Windows.
Depending on your platform, each process may have to start a brand-new interpreter and import
your module to get that f
function to call. (On Windows, it always has to do so.) When it import
s your module, all top-level code is run, which includes the line computed_result=computeInfoPooled(G)
, which creates a whole new pool of 4 processes, etc.
You get around this the same way you deal with any other case where you want the same file to be `both importable as a module and runnable as a script:
def computeInfoPooled(G,num_list):
pool=Pool(processes=4)
def f(k):
curr_stat={}
curr_stat[k]=slow_function(k,G)
return curr_stat
result = pool.map(f,num_list)
return result
if __name__ == '__main__':
computed_result=computeInfoPooled(G)
From your edit, and your comments, you seem to be expecting that doing the call to computeInfoPooled(G)
from the interactive interpreter will solve that problem. The same linked docs section explains in detail why that don't work, and a big note at the very top of the Introduction directly says:
Functionality within this package requires that the main module be importable by the children. This is covered in Programming guidelines however it is worth pointing out here. This means that some examples, such as the multiprocessing.Pool examples will not work in the interactive interpreter.
If you want to understand why this is true, you need to read the linked docs (and you will also need to understand a little about how import
, pickle
, and multiprocessing
all work).
Upvotes: 0
Reputation: 880937
On Windows you need
if __name__ == '__main__':
computed_result = computeInfoPooled(G)
to make your script importable without starting a fork bomb. (Read the section entitled "Safe importing of main module" in the docs.
Also note that on Windows you may not be able to use the multiprocessing module from the interactive interpreter. See the warning near the top of the docs:
Functionality within this package requires that the main module be importable by the children. This is covered in Programming guidelines however it is worth pointing out here. This means that some examples, such as the multiprocessing.Pool examples will not work in the interactive interpreter. (my emphasis.)
Instead, save the script to a file, e.g. script.py
and run it from the command line:
python script.py
In addition, you need the arguments to pool.map
to be picklable.
The function f
needs to be defined at the module level (not inside computeInfoPooled
to be picklable:
def f(k):
curr_stat = slow_function(k, G)
return k, curr_stat
def computeInfoPooled(G, num_list):
pool = Pool(processes=4)
result = pool.map(f, num_list)
return dict(result)
if __name__ == '__main__':
computed_result = computeInfoPooled(G)
By the way, if f
returns a dict, then pool.map(f, ...)
will return a list of dicts. I'm not sure that is what you'd want, especially since each dict would only have one key-value pair.
Instead, if you let f
return a (key, value) tuple, then pool.map(f, ...)
will return a list of tuples, which you could then turn into a dict with dict(result)
.
Upvotes: 1