rhombidodecahedron
rhombidodecahedron

Reputation: 7922

Distributing tasks to modules in python

I have a list named results and a corresponding list of modules named modules, as well as some options that apply to all of the results and modules. Each module has a function save that needs to be called on the result that corresponds to it. So I can do something like the following:

q = deque(results)
for module in modules:
    module.save(q.popleft(), options)

or equivalently:

for i in range(len(modules)):
    modules[i].save(results[i], options)

This works great. However, there is no reason why I shouldn't do them all at the same time. But how to parallelize this in the best possible way is eluding me, even though it seems like it should be very straightforward. Can someone please point me in the right direction? Python 2.6.6, please.

Upvotes: 1

Views: 56

Answers (1)

unutbu
unutbu

Reputation: 880459

If the items in result are picklable, then you using Pool.apply_async to run module.save concurrently like this:

import multiprocessing as mp
import itertools as IT
import logging

logger = mp.log_to_stderr(logging.DEBUG)
logger.setLevel(logging.DEBUG)

if __name__ == '__main__':
    pool = mp.Pool()
    for module, result in IT.izip(modules, results):
        pool.apply_async(module.save, args=(result, options))

    pool.close()
    pool.join()

In module.py:

import logging    
logger = logging.getLogger(__name__)

def save():
    logger.debug('Starting save')
    ...
    logger.debug('Exiting save')

When run with logger.setLevel(logging.DEBUG), you'll see lots of debugging messages which will help you understand where Python is in the code for each process.

To silence logging, simply change that line to

logging.disable(logging.CRITICAL)

Upvotes: 1

Related Questions