Reputation: 33235
The ordering of results from the returned iterator of imap_unordered
is arbitrary, and it doesn't seem to run faster than imap
(which I check with the following code), so why would one use this method?
from multiprocessing import Pool
import time
def square(i):
time.sleep(0.01)
return i ** 2
p = Pool(4)
nums = range(50)
start = time.time()
print 'Using imap'
for i in p.imap(square, nums):
pass
print 'Time elapsed: %s' % (time.time() - start)
start = time.time()
print 'Using imap_unordered'
for i in p.imap_unordered(square, nums):
pass
print 'Time elapsed: %s' % (time.time() - start)
Upvotes: 37
Views: 33676
Reputation: 1391
imap_unordered also seems to use less memory over time than imap. At least that's what I experienced with a iterator over millions of things.
Upvotes: 14
Reputation: 104792
Using pool.imap_unordered
instead of pool.imap
will not have a large effect on the total running time of your code. It might be a little faster, but not by too much.
What it may do, however, is make the interval between values being available in your iteration more even. That is, if you have operations that can take very different amounts of time (rather than the consistent 0.01
seconds you were using in your example), imap_unordered
can smooth things out by yielding faster-calculated values ahead of slower-calculated values. The regular imap
will delay yielding the faster ones until after the slower ones ahead of them have been computed (but this does not delay the worker processes moving on to more calculations, just the time for you to see them).
Try making your work function sleep for i*0.1
seconds, shuffling your input list and printing i
in your loops. You'll be able to see the difference between the two imap
versions. Here's my version (the main
function and the if __name__ == '__main__'
boilerplate was is required to run correctly on Windows):
from multiprocessing import Pool
import time
import random
def work(i):
time.sleep(0.1*i)
return i
def main():
p = Pool(4)
nums = range(50)
random.shuffle(nums)
start = time.time()
print 'Using imap'
for i in p.imap(work, nums):
print i
print 'Time elapsed: %s' % (time.time() - start)
start = time.time()
print 'Using imap_unordered'
for i in p.imap_unordered(work, nums):
print i
print 'Time elapsed: %s' % (time.time() - start)
if __name__ == "__main__":
main()
The imap
version will have long pauses while values like 49 are being handled (taking 4.9 seconds), then it will fly over a bunch of other values (which were calculated by the other processes while we were waiting for 49 to be processed). In contrast, the imap_unordered
loop will usually not pause nearly as long at one time. It will have more frequent, but shorter pauses, and its output will tend to be smoother.
Upvotes: 53