agdhruv
agdhruv

Reputation: 640

If I want to give more work to my Process Pool, can I call Pool.join() before Pool.close()?

The documentation for multiprocessing states the following about Pool.join():

Wait for the worker processes to exit. One must call close() or terminate() before using join().

I know that Pool.close() prevents any other task from being submitted to the pool; and that Pool.join() waits for the pool to finish before proceeding with the parent process.

So, why can I not call Pool.join() before Pool.close() in the case when I want to reuse my pool for performing multiple tasks and then finally close() it much later? For example:

pool = Pool()
pool.map(do1)
pool.join() # need to wait here for synchronization
.
.
.
pool.map(do2)
pool.join() # need to wait here again for synchronization
.
.
.
pool.map(do3)
pool.join() # need to wait here again for synchronization
pool.close()

# program ends

Why must one "call close() or terminate() before using join()"?

Upvotes: 17

Views: 1682

Answers (4)

Kraigolas
Kraigolas

Reputation: 5560

Just to make it painfully obvious, you can use the following code to prove to yourself that the code is in fact synchronized regardless:

import multiprocessing
import time
import datetime

def do_something(i):
    with open(f'{i}.txt', 'w') as fp:
        fp.write(str(i))
    time.sleep(5)


def main():
    pool = multiprocessing.Pool(5)

    print('Starting process', datetime.datetime.now())
    pool.map(do_something, range(5))
    print('Waiting for sync', datetime.datetime.now())
    pool.map(do_something, range(5, 10))
    print('Waiting for final sync', datetime.datetime.now())

if __name__ == '__main__':
    main()

You'll observe that

Waiting for sync

is not printed until the first pool completes its assigned tasks. Pool.join() is specifically waiting for the worker processes to exit, not for them to synchronize. They don't run asynchronously in this configuration regardless and will sync up fine.

Upvotes: 0

alex_noname
alex_noname

Reputation: 32083

You need not call join() after map() in your case, because map() call blocks until all results are done.

Call join() before close() or terminate() is incorrect. Because join() is a blocking call and wait for the worker processes to exit. Therefore you can not reuse pool after join().

Upvotes: 4

user4815162342
user4815162342

Reputation: 154911

So, why can I not call Pool.join() before Pool.close()

Because join() waits for the workers to exit. Not just finish the tasks they've been given, but actually exit. If you didn't call close() beforehand, then no one had told the workers to exit and they are on stand-by, ready to accept further tasks.

So a call to join() not preceded by a call to close() would just hang - join() would wait forever for workers to exit, which no one told them to do. For this reason Python raises a ValueError("pool is still running") error if yopu attempt to do so.

As David Schwartz pointed out, don't call join() to "synchronize" - it doesn't serve that purpose.

Upvotes: 7

David Schwartz
David Schwartz

Reputation: 182763

Just re-use the pool without calling any special functions on it. There's nothing special you need to do to continue to send jobs to the pool. If you aren't completely done with it, just leave it alone and let it keep doing its thing.

Upvotes: 0

Related Questions