Reputation: 7922
I have a list named results and a corresponding list of modules named modules, as well as some options that apply to all of the results and modules. Each module has a function save that needs to be called on the result that corresponds to it. So I can do something like the following:
q = deque(results)
for module in modules:
module.save(q.popleft(), options)
or equivalently:
for i in range(len(modules)):
modules[i].save(results[i], options)
This works great. However, there is no reason why I shouldn't do them all at the same time. But how to parallelize this in the best possible way is eluding me, even though it seems like it should be very straightforward. Can someone please point me in the right direction? Python 2.6.6, please.
Upvotes: 1
Views: 56
Reputation: 880459
If the items in result
are picklable, then you using Pool.apply_async to run module.save
concurrently like this:
import multiprocessing as mp
import itertools as IT
import logging
logger = mp.log_to_stderr(logging.DEBUG)
logger.setLevel(logging.DEBUG)
if __name__ == '__main__':
pool = mp.Pool()
for module, result in IT.izip(modules, results):
pool.apply_async(module.save, args=(result, options))
pool.close()
pool.join()
In module.py:
import logging
logger = logging.getLogger(__name__)
def save():
logger.debug('Starting save')
...
logger.debug('Exiting save')
When run with logger.setLevel(logging.DEBUG)
, you'll see lots of debugging messages which will help you understand where Python is in the code for each process.
To silence logging, simply change that line to
logging.disable(logging.CRITICAL)
Upvotes: 1