Difference between the map() module and imap() in the multiprocessing calculation

Question

I have a piece of code with a multiprocessing implementation:

q = range(len(aaa))
w = range(len(aab))
e = range(len(aba))

paramlist = list(itertools.product(q,w,e))     

def f(combinations):
     q = combinations[0]
     w = combinations[1]
     e = combinations[2]    
# the rest of the function

if __name__ == '__main__':

     pool = mul.Pool(4)
     res_p = pool.map(f, paramlist)

     for _ in tqdm.tqdm(res_p, total=len(paramlist)):
           pass

     pool.close()
     pool.join()

Where 'aaa, aab, aba' are lists with triple values of type:

aaa = [[1,2,3], [3,5,1], ...], etc.

And I wanted to use imap() to be able to follow the calculation progress using module tqdm(). But why does the map() show me the length of the list(res_p) list correctly, but when I change to imap(), the list is empty? Can you track progress using the map() module?

Thijs van Dien · Accepted Answer

tqdm doesn't work with map because map is blocking; it waits for all results and then returns them as a list. By the time your loop is executed, the only progress to be made is what happens in that loop—the parallel phase has already been completed.

imap does not block, since it returns just an iterator, i.e. a thing you can ask for the next result, and the next result, and the next result. Only when you do that, by looping over it, the next result is waited for, one after another. The consequence of it being an iterator means that once all results have been consumed (the end of your loop), it is empty. As such, there's nothing left to put in a list. If you wish to keep the results, you could append each in the loop, for example, or change the code to this:

res_p = list(tqdm.tqdm(pool.imap(f, paramlist), total=len(paramlist)))

for res in res_p:
    ... # Do stuff

Difference between the map() module and imap() in the multiprocessing calculation

Answers (1)

Related Questions