Reputation: 41
Python 2.7.3 on Solaris 10
Questions
Background and Code
I have a python script that first spawns a worker management thread. The worker management thread then spawns one or more worker threads. I have other stuff going on in my main thread that I cannot block. My management thread stuff and worker threads are rock-solid. My services run for years without restarts but then we have this subprocess.Popen
scenario:
In the run method of the worker thread, I am using:
class workerThread(threading.Thread):
def __init__(self) :
super(workerThread, self).__init__()
...
def run(self)
...
atempfile = tempfile.NamedTempFile(delete=False)
myprocess = subprocess.Popen( ['third-party-cmd', 'with', 'arguments'], shell=False, stdin=subprocess.PIPE, stdout=atempfile, stderr=subprocess.STDOUT,close_fds=True)
...
I need to use myprocess.poll()
to check for process termination because I need to scan the atempfile
until I find relevant information (the file may be > 1 GiB) and I need to terminate the process because of user request or because the process has been running too long. Once I find what I am looking for, I will stop checking the stdout temp file. I will clean it up after the external process is dead and before the worker thread terminates. I need the stdin PIPE in case I need to inject a response to something interactive in the child's stdin stream.
In my main program, I set a SIGINT and SIGTERM handler for me to perform cleanup, if my main python program is terminated with SIGTERM or SIGINT(Ctrl-C) if running from the shell.
Does anyone have a solid 2.x recipe for child signal handling in threads? ctypes sigprocmask, etc.
Any help would be very appreciated. I am just looking for an 'official' recipe or the BEST hack, if one even exists.
Notes
I am using a restricted build of Python. I must use 2.7.3. Third-party-cmd is a program I do not have source for - modifying it is not possible.
Upvotes: 4
Views: 1235
Reputation: 8020
There are many things in your description that look strange. First thing, you have a couple of different threads and processes. Who is crashing, who's receinving SIGTERM and who's receiving SIGKILL and due to which operations ?
Second: why does your parent receive SIGTERM ? It can't be implicitly sent. Someone is calling kill to your parent process, either directly or indirectly (for example, by killing the whole parent group).
Third point: how's your program terminating when you're handling SIGTERM ? By definition, the program terminates if it's not handled. If it's handled, it's not terminated. What's really happenning ?
Suggestions:
$ cat crsh.c
#include <stdio.h>
int main(void)
{
int *f = 0x0;
puts("Crashing");
*f = 0;
puts("Crashed");
return 0;
}
$ cat a.py
import subprocess, sys
print('begin')
p = subprocess.Popen('./crsh')
a = raw_input()
print(a)
p.wait()
print('end')
$ python a.py
begin
Crashing
abcd
abcd
end
This works. No signal delivered to the parent. Did you isolate the problem in your program ?
If the problem is a signal sent to multiple processes: can you use setpgid to set up a separate process group for the child ?
Is there any reason for creating the temporary file ? It's 1 GB files being created in your temporary directory. Why not piping stdout ?
If you're really sure you need to handle signals in your parent program (why didn't you try/except KeyboardInterrupt, for example ?): could signal() unspecified behavior with multi threaded programs be causing those problems (for example, dispatching a signal to a thread that does not handle signals) ?
NOTES
The effects of signal() in a multithreaded process are unspecified.
Anyway, try to explain with more precision what are the threads and process of your program, what they do, how were the signal handlers set up and why, who is sending signals, who is receiving, etc, etc, etc, etc, etc.
Upvotes: 1