Reputation: 3291
Pebble's process pools take parameters for max_workers and max_tasks.
https://pythonhosted.org/Pebble/#pools
The description of max_tasks is a little unclear:
"If max_tasks is a number greater than zero each worker will be restarted after performing an equal amount of tasks."
My question is:
What if it is not greater than zero? How does it behave then?
What does it mean to restart a worker? Let's say max_tasks is 5. Then will each process execute 5 iterations, then be killed, and a new process restated? What is the benefit of doing this?
I know that other libraries allow you to customize pool maps depending on whether each task is expected to take similar time to complete, or not. Is that relevant here?
In general, what guidelines are there for setting max_tasks?
I'm running a function that needs to be run on every element of a list of length += 160 000. It's completely parallelizable, and my server has 8 cores. each function call will take apporx the same time to complete, at most 3 times longer than the average time.
Thanks.
Upvotes: 1
Views: 1733
Reputation: 15050
The max_task
parameter is analogous of the maxtaskperchild
in multiprocessing.Pool
. Python 2 related documentation explains the purpose for such parameter.
Worker processes within a Pool typically live for the complete duration of the Pool’s work queue. A frequent pattern found in other systems (such as Apache, mod_wsgi, etc) to free resources held by workers is to allow a worker within a pool to complete only a set amount of work before being exiting, being cleaned up and a new process spawned to replace the old one. The maxtasksperchild argument to the Pool exposes this ability to the end user.
In other words, you use max_task
if you want to limit the amount of resources growth the processes can sustain. It is useful in case you are dealing with libraries which leak memory or file descriptors for example. Another use case is to limit the memory wasted by memory fragmentation occurring within the process.
Upvotes: 1