Reputation: 1807
I have an issue in Python 3.7.3 where my multiprocessing operation (using Queue, Pool, and apply_async) deadlocks when handling large computational tasks.
For small computations, this multiprocessing task works just fine. However, when dealing with larger processes, the multiprocessing task stops, or deadlocks, altogether without exiting the process! I read that this will happen if you "grow your queue without bounds, and you are joining up to a subprocess that is waiting for room in the queue [...] your main process is stalled waiting for that one to complete, and it never will." (Process.join() and queue don't work with large numbers)
I am having trouble converting this concept into code. I would greatly appreciate guidance on refactoring the code I have written below:
import multiprocessing as mp
def listener(q, d): # task to queue information into a manager dictionary
while True:
item_to_write = q.get()
if item_to_write == 'kill':
break
foo = d['region']
foo.add(item_to_write)
d['region'] = foo # add items and set to manager dictionary
def main():
manager = mp.Manager()
q = manager.Queue()
d = manager.dict()
d['region'] = set()
pool = mp.Pool(mp.cpu_count() + 2)
watcher = pool.apply_async(listener, (q, d))
jobs = []
for i in range(24):
job = pool.apply_async(execute_search, (q, d)) # task for multiprocessing
jobs.append(job)
for job in jobs:
job.get() # begin multiprocessing task
q.put('kill') # kill multiprocessing task (view listener function)
pool.close()
pool.join()
print('process complete')
if __name__ == '__main__':
main()
Ultimately, I would like to prevent deadlocking altogether to facilitate a multiprocessing task that could operate indefinitely until completion.
BELOW IS THE TRACEBACK WHEN EXITING DEADLOCK IN BASH
^CTraceback (most recent call last):
File "multithread_search_cl_gamma.py", line 260, in <module>
main(GEOTAG)
File "multithread_search_cl_gamma.py", line 248, in main
job.get()
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/pool.py", line 651, in get
Process ForkPoolWorker-28:
Process ForkPoolWorker-31:
Process ForkPoolWorker-30:
Process ForkPoolWorker-27:
Process ForkPoolWorker-29:
Process ForkPoolWorker-26:
self.wait(timeout)
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/pool.py", line 648, in wait
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/pool.py", line 110, in worker
task = get()
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/pool.py", line 110, in worker
task = get()
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/queues.py", line 351, in get
with self._rlock:
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/queues.py", line 351, in get
self._event.wait(timeout)
File "/Users/Ira/anaconda3/lib/python3.7/threading.py", line 552, in wait
Traceback (most recent call last):
Traceback (most recent call last):
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/pool.py", line 110, in worker
task = get()
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/queues.py", line 352, in get
res = self._reader.recv_bytes()
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/pool.py", line 110, in worker
task = get()
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/queues.py", line 351, in get
with self._rlock:
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__
return self._semlock.__enter__()
KeyboardInterrupt
KeyboardInterrupt
signaled = self._cond.wait(timeout)
File "/Users/Ira/anaconda3/lib/python3.7/threading.py", line 296, in wait
waiter.acquire()
KeyboardInterrupt
with self._rlock:
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__
return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/pool.py", line 110, in worker
task = get()
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/queues.py", line 351, in get
with self._rlock:
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__
return self._semlock.__enter__()
KeyboardInterrupt
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__
return self._semlock.__enter__()
KeyboardInterrupt
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/pool.py", line 110, in worker
task = get()
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/queues.py", line 351, in get
with self._rlock:
File "/Users/Ira/anaconda3/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__
return self._semlock.__enter__()
KeyboardInterrupt
Below is the updated script:
import multiprocessing as mp
import queue
def listener(q, d, stop_event):
while not stop_event.is_set():
try:
while True:
item_to_write = q.get(False)
if item_to_write == 'kill':
break
foo = d['region']
foo.add(item_to_write)
d['region'] = foo
except queue.Empty:
pass
time.sleep(0.5)
if not q.empty():
continue
def main():
manager = mp.Manager()
stop_event = manager.Event()
q = manager.Queue()
d = manager.dict()
d['region'] = set()
pool = mp.get_context("spawn").Pool(mp.cpu_count() + 2)
watcher = pool.apply_async(listener, (q, d, stop_event))
stop_event.set()
jobs = []
for i in range(24):
job = pool.apply_async(execute_search, (q, d))
jobs.append(job)
for job in jobs:
job.get()
q.put('kill')
pool.close()
pool.join()
print('process complete')
if __name__ == '__main__':
main()
UPDATE::
execute_command
executes several processes necessary for search, so I put in code for where q.put()
lies.
Alone, the script will take > 72 hrs to finish. Each multiprocess never completes the entire task, rather they work individually and reference a manager.dict()
to avoid repeating tasks. These tasks work until every tuple in the manager.dict()
has been processed.
def area(self, tup, housing_dict, q):
state, reg, sub_reg = tup[0], tup[1], tup[2]
for cat in housing_dict:
"""
computationally expensive, takes > 72 hours
for a list of 512 tup(s)
"""
result = self.search_geotag(
state, reg, cat, area=sub_reg
)
q.put(tup)
The q.put(tup)
is ultimately placed in the listener
function to add tup
to the manager.dict()
Upvotes: 3
Views: 3507
Reputation: 3801
Since listener
and execute_search
are sharing the same queue object, there could be race,
where execute_search
gets 'kill' from queue before listener
does, thus listener
will stuck in blocking get()
forever, since there are no more new items.
For that case you can use Event object to signal all processes to stop:
import multiprocessing as mp
import queue
def listener(q, d, stop_event):
while not stop_event.is_set():
try:
item_to_write = q.get(timeout=0.1)
foo = d['region']
foo.add(item_to_write)
d['region'] = foo
except queue.Empty:
pass
print("Listener process stopped")
def main():
manager = mp.Manager()
stop_event = manager.Event()
q = manager.Queue()
d = manager.dict()
d['region'] = set()
pool = mp.get_context("spawn").Pool(mp.cpu_count() + 2)
watcher = pool.apply_async(listener, (q, d, stop_event))
stop_event.set()
jobs = []
for i in range(24):
job = pool.apply_async(execute_search, (q, d))
jobs.append(job)
try:
for job in jobs:
job.get(300) #get the result or throws a timeout exception after 300 seconds
except multiprocessing.TimeoutError:
pool.terminate()
stop_event.set() # stop listener process
print('process complete')
if __name__ == '__main__':
main()
Upvotes: 1