Reputation: 2171
I have just studied this docu part enter link description here
So as far as I understood this function
import multiprocessing
pool = multiprocessing.Pool()
print pool.map(f, range(10))
would create a chunk of task which number is equivalent to the number of cores. And the result will be on the same order as it got the input from the sequnze.
The docu says also --- will block till complete:
Lets imagine f above is a complex function. We have 4 CPUS and a therefor a chunk size of 4, does it block until all 4 will have finished and only then get the next chunk?
So in worse case 3 free cores would idle a long time until that last one will finish?
Upvotes: 2
Views: 165
Reputation: 155684
You seem to be under the impression that the chunksize
will match the number of cores. This is not correct. When not specified, chunksize
has an implementation defined value, and it's not equal to the number of cores, at least on CPython (the reference interpreter). At time of writing, on both Python 2.7 and 3.7, the computation used is:
if chunksize is None:
chunksize, extra = divmod(len(iterable), len(self._pool) * 4)
if extra:
chunksize += 1
len(self._pool)
is the number of worker processes, len(iterable)
is the number of items in the input iterable (which is list
ified if it didn't have a defined length).
So for your case, the calculation is:
chunksize, extra = divmod(10, numcores * 4)
if extra:
chunksize += 1
which for a four core machine (for example), would compute chunksize, extra = 0, 10
, and the if
check would then change chunksize
to 1
. So each worker would take a single input value (0, 1, 2 and 3 would be grabbed almost immediately), then as each worker finished, it would grab one more item. Assuming all items take roughly the same amount of time, you'd do two rounds with full occupancy (4/4 cores in use), then one round with half occupancy (2/4 cores in use). Your worst case scenario is that the last task to begin takes the longest to run. If this is knowable ahead of time, you should try to organize your inputs to prevent that (placing the most expensive items first, so the final tasks run with incomplete occupancy are short and finish fast, maximizing parallelism); otherwise, it's pretty unavoidable.
For a larger number of tasks, yes, the default chunksize
will increase, e.g. for 100 inputs on four cores, you'd have a chunksize
of 7
, producing 15 chunks, the last of which is undersized. So yes, for tasks with wildly varying runtimes, you'd risk a long tail with low occupancy. If that's a risk, explicitly set your chunksize
to 1
; it reduces overall performance (bringing it closer to that of imap
), but it removes the possibility of one worker working on item 1 of 7 in a chunk with all the other cores sitting idle.
Upvotes: 3
Reputation: 4765
You are partially right.
You can also read that map
accepts chunksize
parameter that can be used to tune the size of task chunks submitted to pool processes. If the chunks are small enough each process should bee feed fairly equally and all cores will be working most of the time.
Upvotes: 0