james0011
james0011

Reputation: 39

Python multiprocessing pool number of jobs not correct

I wrote a python program to launch parallel processes (16) using pool, to process some files. At the beginning of the run, the number of processes is maintained at 16 until almost all files get processed. Then, for some reasons which I don't understand, when there're only a few files left, only one process runs at a time which makes processing time much longer than necessary. Could you help with this?

Upvotes: 1

Views: 1112

Answers (2)

Alberto Re
Alberto Re

Reputation: 514

Force map() to use a chunksize of 1 instead of guessing the best value by itself, es.:

pool = Pool(16)
pool.map(func, iterable, 1)

This should (in theory) guarantee the best distribution of load among workers until the end of the input data.

See here

Upvotes: 1

Xxxo
Xxxo

Reputation: 1931

Python, before starts the execution of the process that you specify in applyasync/asyncmap of Pool, assigns to each worker a piece of the work.

For example, lets say that you have 8 files to process and you start a Pool with 4 workers.

Before starting the file processing, two specific files will be assigned to each worker. This means that if some worker ends its job earlier than the others, will simply "have a break" and will not start helping the others.

Upvotes: 0

Related Questions