Reputation: 39
I wrote a python program to launch parallel processes (16) using pool, to process some files. At the beginning of the run, the number of processes is maintained at 16 until almost all files get processed. Then, for some reasons which I don't understand, when there're only a few files left, only one process runs at a time which makes processing time much longer than necessary. Could you help with this?
Upvotes: 1
Views: 1112
Reputation: 514
Force map()
to use a chunksize of 1 instead of guessing the best value by itself, es.:
pool = Pool(16)
pool.map(func, iterable, 1)
This should (in theory) guarantee the best distribution of load among workers until the end of the input data.
See here
Upvotes: 1
Reputation: 1931
Python, before starts the execution of the process that you specify in applyasync/asyncmap of Pool, assigns to each worker a piece of the work.
For example, lets say that you have 8 files to process and you start a Pool with 4 workers.
Before starting the file processing, two specific files will be assigned to each worker. This means that if some worker ends its job earlier than the others, will simply "have a break" and will not start helping the others.
Upvotes: 0