Tomasz Przemski
Tomasz Przemski

Reputation: 1127

Difference between the map() module and imap() in the multiprocessing calculation

I have a piece of code with a multiprocessing implementation:

q = range(len(aaa))
w = range(len(aab))
e = range(len(aba))

paramlist = list(itertools.product(q,w,e))     

def f(combinations):
     q = combinations[0]
     w = combinations[1]
     e = combinations[2]    
# the rest of the function

if __name__ == '__main__':

     pool = mul.Pool(4)
     res_p = pool.map(f, paramlist)

     for _ in tqdm.tqdm(res_p, total=len(paramlist)):
           pass

     pool.close()
     pool.join()

Where 'aaa, aab, aba' are lists with triple values of type:

aaa = [[1,2,3], [3,5,1], ...], etc.

And I wanted to use imap() to be able to follow the calculation progress using module tqdm(). But why does the map() show me the length of the list(res_p) list correctly, but when I change to imap(), the list is empty? Can you track progress using the map() module?

Upvotes: 0

Views: 2307

Answers (1)

Thijs van Dien
Thijs van Dien

Reputation: 6616

tqdm doesn't work with map because map is blocking; it waits for all results and then returns them as a list. By the time your loop is executed, the only progress to be made is what happens in that loop—the parallel phase has already been completed.

imap does not block, since it returns just an iterator, i.e. a thing you can ask for the next result, and the next result, and the next result. Only when you do that, by looping over it, the next result is waited for, one after another. The consequence of it being an iterator means that once all results have been consumed (the end of your loop), it is empty. As such, there's nothing left to put in a list. If you wish to keep the results, you could append each in the loop, for example, or change the code to this:

res_p = list(tqdm.tqdm(pool.imap(f, paramlist), total=len(paramlist)))

for res in res_p:
    ... # Do stuff

Upvotes: 3

Related Questions