Reputation: 689
I am quite new to multiprocessing
library and have question with regards to its Pool
module when used with map()
. Suppose I have 4 worker threads and 6 tasks to be completed. What I do is (using multiprocessing.dummy
because I want to spawn threads and not processes)
from multiprocessing.dummy import Pool as ThreadPool
def print_it(num):
print num
def multi_threaded():
tasks = [1, 2, 3, 4, 5, 6]
pool = ThreadPool(4)
r = pool.map(print_it, tasks)
pool.close()
pool.join()
multi_threaded()
I want to understand how Pool.map() handles the tasks? Three options :
This insight would be helpful as it will help me think of using Pool.map()
more effectively in prod.
Upvotes: 4
Views: 760
Reputation: 12205
It depends how you define your pool.
As you do it in your example, your (2) happens. Your threads or processes depending on Pool get launched as soon as you initialise your Pool (happens in Pool__init__()
- no need to submit tasks for this to happen) and they sit there waiting for tasks. When a task arrives and is executed, threads or processes do not exit, they just go back to waiting state waiting for more work to come.
You can define it work differently, though. You can add maxtasksperchild
parameter to your pool. As soon as a worker has completed this amount of tasks, it exits, and a new worker is immediately launched (no need to give it a task first, it gets launched as soon as a worker exits). This is managed in Pool class Pool._maintain_pool()
and Pool._repopulate_pool()
functions.
If you want your workers to launch at start and run indefinitely, do what you do now and this is what happens. If you want your workers to launch at start but exit and renew themselves after a number of tasks (even one if necessary), use maxtasksperchild
. If you do not want to launch processes or threads before there is a need for them, do not use Pool. Launch threads or processes when you need them and manage them yourself.
Hope this helps.
Upvotes: 1