Reputation: 3636
I am writing a daemon program that spawns several other children processes. After I run the stop
script, the main process keeps running when it's intended to quit, this really confused me.
import daemon, signal
from multiprocessing import Process, cpu_count, JoinableQueue
from http import httpserv
from worker import work
class Manager:
"""
This manager starts the http server processes and worker
processes, creates the input/output queues that keep the processes
work together nicely.
"""
def __init__(self):
self.NUMBER_OF_PROCESSES = cpu_count()
def start(self):
self.i_queue = JoinableQueue()
self.o_queue = JoinableQueue()
# Create worker processes
self.workers = [Process(target=work,
args=(self.i_queue, self.o_queue))
for i in range(self.NUMBER_OF_PROCESSES)]
for w in self.workers:
w.daemon = True
w.start()
# Create the http server process
self.http = Process(target=httpserv, args=(self.i_queue, self.o_queue))
self.http.daemon = True
self.http.start()
# Keep the current process from returning
self.running = True
while self.running:
time.sleep(1)
def stop(self):
print "quiting ..."
# Stop accepting new requests from users
os.kill(self.http.pid, signal.SIGINT)
# Waiting for all requests in output queue to be delivered
self.o_queue.join()
# Put sentinel None to input queue to signal worker processes
# to terminate
self.i_queue.put(None)
for w in self.workers:
w.join()
self.i_queue.join()
# Let main process return
self.running = False
import daemon
manager = Manager()
context = daemon.DaemonContext()
context.signal_map = {
signal.SIGHUP: lambda signum, frame: manager.stop(),
}
context.open()
manager.start()
The stop
script is just a one-liner os.kill(pid, signal.SIGHUP)
, but after that the children processes (worker processes and http server process) end nicely, but the main process just stays there, I don't know what keeps it from returning.
Upvotes: 4
Views: 2608
Reputation: 13518
I tried a different approach, and this seems to work (note I took out the daemon portions of the code as I didn't have that module installed).
import signal
class Manager:
"""
This manager starts the http server processes and worker
processes, creates the input/output queues that keep the processes
work together nicely.
"""
def __init__(self):
self.NUMBER_OF_PROCESSES = cpu_count()
def start(self):
# all your code minus the loop
print "waiting to die"
signal.pause()
def stop(self):
print "quitting ..."
# all your code minus self.running
manager = Manager()
signal.signal(signal.SIGHUP, lambda signum, frame: manager.stop())
manager.start()
One warning, is that signal.pause() will unpause for any signal, so you may want to change your code accordingly.
EDIT:
The following works just fine for me:
import daemon
import signal
import time
class Manager:
"""
This manager starts the http server processes and worker
processes, creates the input/output queues that keep the processes
work together nicely.
"""
def __init__(self):
self.NUMBER_OF_PROCESSES = 5
def start(self):
# all your code minus the loop
print "waiting to die"
self.running = 1
while self.running:
time.sleep(1)
print "quit"
def stop(self):
print "quitting ..."
# all your code minus self.running
self.running = 0
manager = Manager()
context = daemon.DaemonContext()
context.signal_map = {signal.SIGHUP : lambda signum, frame: manager.stop()}
context.open()
manager.start()
What version of python are you using?
Upvotes: 1
Reputation: 99520
You create the http server process but don't join()
it. What happens if, rather than doing an os.kill()
to stop the http server process, you send it a stop-processing sentinel (None
, like you send to the workers) and then do a self.http.join()
?
Update: You also need to send the None
sentinel to the input queue once for each worker. You could try:
for w in self.workers:
self.i_queue.put(None)
for w in self.workers:
w.join()
N.B. The reason you need two loops is that if you put the None
into the queue in the same loop that does the join()
, that None
may be picked up by a worker other than w
, so joining on w
will cause the caller to block.
You don't show the code for workers or http server, so I assume these are well-behaved in terms of calling task_done etc. and that each worker will quit as soon as it sees a None
, without get()
-ing any more things from the input queue.
Also, note that there is at least one open, hard-to-reproduce issue with JoinableQueue.task_done()
, which may be biting you.
Upvotes: 1