Neil
Neil

Reputation: 3291

Python Pebble ProcessPool how to set max_tasks

Pebble's process pools take parameters for max_workers and max_tasks.

https://pythonhosted.org/Pebble/#pools

The description of max_tasks is a little unclear:

"If max_tasks is a number greater than zero each worker will be restarted after performing an equal amount of tasks."

My question is:

I'm running a function that needs to be run on every element of a list of length += 160 000. It's completely parallelizable, and my server has 8 cores. each function call will take apporx the same time to complete, at most 3 times longer than the average time.

Thanks.

Upvotes: 1

Views: 1733

Answers (1)

noxdafox
noxdafox

Reputation: 15050

The max_task parameter is analogous of the maxtaskperchild in multiprocessing.Pool. Python 2 related documentation explains the purpose for such parameter.

Worker processes within a Pool typically live for the complete duration of the Pool’s work queue. A frequent pattern found in other systems (such as Apache, mod_wsgi, etc) to free resources held by workers is to allow a worker within a pool to complete only a set amount of work before being exiting, being cleaned up and a new process spawned to replace the old one. The maxtasksperchild argument to the Pool exposes this ability to the end user.

In other words, you use max_task if you want to limit the amount of resources growth the processes can sustain. It is useful in case you are dealing with libraries which leak memory or file descriptors for example. Another use case is to limit the memory wasted by memory fragmentation occurring within the process.

Upvotes: 1

Related Questions