Reputation: 640
The documentation for multiprocessing
states the following about Pool.join()
:
Wait for the worker processes to exit. One must call
close()
orterminate()
before usingjoin()
.
I know that Pool.close()
prevents any other task from being submitted to the pool; and that Pool.join()
waits for the pool to finish before proceeding with the parent process.
So, why can I not call Pool.join()
before Pool.close()
in the case when I want to reuse my pool for performing multiple tasks and then finally close()
it much later? For example:
pool = Pool()
pool.map(do1)
pool.join() # need to wait here for synchronization
.
.
.
pool.map(do2)
pool.join() # need to wait here again for synchronization
.
.
.
pool.map(do3)
pool.join() # need to wait here again for synchronization
pool.close()
# program ends
Why must one "call close()
or terminate()
before using join()
"?
Upvotes: 17
Views: 1682
Reputation: 5560
Just to make it painfully obvious, you can use the following code to prove to yourself that the code is in fact synchronized regardless:
import multiprocessing
import time
import datetime
def do_something(i):
with open(f'{i}.txt', 'w') as fp:
fp.write(str(i))
time.sleep(5)
def main():
pool = multiprocessing.Pool(5)
print('Starting process', datetime.datetime.now())
pool.map(do_something, range(5))
print('Waiting for sync', datetime.datetime.now())
pool.map(do_something, range(5, 10))
print('Waiting for final sync', datetime.datetime.now())
if __name__ == '__main__':
main()
You'll observe that
Waiting for sync
is not printed until the first pool completes its assigned tasks. Pool.join()
is specifically waiting for the worker processes to exit, not for them to synchronize. They don't run asynchronously in this configuration regardless and will sync up fine.
Upvotes: 0
Reputation: 32083
You need not call join()
after map()
in your case, because map()
call blocks until all results are done.
Call join()
before close()
or terminate()
is incorrect. Because join()
is a blocking call and wait for the worker processes to exit. Therefore you can not reuse pool after join()
.
Upvotes: 4
Reputation: 154911
So, why can I not call
Pool.join()
beforePool.close()
Because join()
waits for the workers to exit. Not just finish the tasks they've been given, but actually exit. If you didn't call close()
beforehand, then no one had told the workers to exit and they are on stand-by, ready to accept further tasks.
So a call to join()
not preceded by a call to close()
would just hang - join()
would wait forever for workers to exit, which no one told them to do. For this reason Python raises a ValueError("pool is still running")
error if yopu attempt to do so.
As David Schwartz pointed out, don't call join()
to "synchronize" - it doesn't serve that purpose.
Upvotes: 7
Reputation: 182763
Just re-use the pool without calling any special functions on it. There's nothing special you need to do to continue to send jobs to the pool. If you aren't completely done with it, just leave it alone and let it keep doing its thing.
Upvotes: 0