Reputation: 3586
I'm trying to make something like supervisor for my python daemon process and found out that same code works in python2 and doesn't work in python3.
Generally, I've come to this minimal example code.
daemon.py
#!/usr/bin/env python
import signal
import sys
import os
def stop(*args, **kwargs):
print('daemon exited', os.getpid())
sys.exit(0)
signal.signal(signal.SIGTERM, stop)
print('daemon started', os.getpid())
while True:
pass
supervisor.py
import os
import signal
import subprocess
from time import sleep
parent_pid = os.getpid()
commands = [
[
'./daemon.py'
]
]
popen_list = []
for command in commands:
popen = subprocess.Popen(command, preexec_fn=os.setsid)
popen_list.append(popen)
def stop_workers(*args, **kwargs):
for popen in popen_list:
print('send_signal', popen.pid)
popen.send_signal(signal.SIGTERM)
while True:
popen_return_code = popen.poll()
if popen_return_code is not None:
break
sleep(5)
signal.signal(signal.SIGTERM, stop_workers)
for popen in popen_list:
print('wait_main', popen.wait())
If you run supervisor.py and then call kill -15
on its pid, then it will hang in infinite loop, because popen_return_code will never be not None. I discovered, that it's basically because of adding threading.Lock for wait_pid operation (source), but how can I rewrite code so it'll handle child exit correctly?
Upvotes: 3
Views: 438
Reputation: 3586
Generally, I agree with answer from @risboo6909, but also have some thoughts, how to fix this situation.
subproccess.Popen
to psutil.Popen
.popen.wait()
you can just do infinite loop, because process will exit in signal handler.Upvotes: 1
Reputation: 38
This is an interesting case.
I've spent few hours trying to figure out the reason why this happens and the only thing I came up with at this moment is that the implementation of wait()
and poll()
have been changed in python3
versus python2.7
.
Looking into the source code of python3/suprocess.py
implementation, we can see that there is a lock acquire happens when you call wait()
method of Popen
object, see
https://github.com/python/cpython/blob/master/Lib/subprocess.py#L1402.
This lock prevents further poll()
calls to work as expected until the lock acquired by wait()
will be released, see
https://github.com/python/cpython/blob/master/Lib/subprocess.py#L1355
and comment there
Something else is busy calling waitpid. Don't allow two at once. We know nothing yet.
There is no such a lock in python2.7/subprocess.py
so this looks like a reason why it works in python2.7
and doesn't work in python3
.
However I don't see a reason why are you trying to poll()
inside the signal handler, try rewrite your supervisor.py
as following, this should work as expected both on python3
and python2.7
supervisor.py
import os
import signal
import subprocess
from time import sleep
parent_pid = os.getpid()
commands = [
[
'./daemon.py'
]
]
popen_list = []
for command in commands:
popen = subprocess.Popen(command, preexec_fn=os.setsid)
popen_list.append(popen)
def stop_workers(*args, **kwargs):
for popen in popen_list:
print('send_signal', popen.pid)
popen.send_signal(signal.SIGTERM)
signal.signal(signal.SIGTERM, stop_workers)
for popen in popen_list:
print('wait_main', popen.wait())
Hope this helps
Upvotes: 2