karlosss
karlosss

Reputation: 3155

Linux - reasons for SIGSTOP and how to deal with it?

I have a Python script which is running bash scripts. I need to be able to kill the bash script if it seems to be infinite and it also has to be run in chroot jail because the script might be dangerous. I run it with psutil.Popen() and leave it running for two seconds. If it does not end naturally, I send SIGKILL to it and all of its possible children.

The problem is that if I kill one script due to overtime execution and run another one, the main (Python) script receives a SIGSTOP. On my local machine, I made a really stupid solution: the Python script wrote its PID to a file at startup and then I run another script, which was sending SIGCONT every second to the PID which was stored in the file. This has two problems: it is really stupid, but even worse is that it refuses to work on the server - SIGCONT just does nothing there.

The sequence is: Python script runs a bash script responsive for the jail and that bash script runs the possibly dangerous and/or infinite script. This script might have some children as well.

The relevant parts of the codes:

Main python script

    p = psutil.Popen(["bash", mode, script_path, self.TESTENV_ROOT])
    start = time.time()

    while True:
        if p.status() == psutil.STATUS_ZOMBIE:
            # process ended naturally
            duration = time.time() - start
            self.stdout.write("Script finished, execution time: {}s".format(duration))
            break

        if time.time() > start + run_limit:
            children = p.children(recursive=True)
            for child in children:
                child.kill()
            p.kill()
            duration = None
            self.stdout.write("Script exceeded maximum time ({}s) and was killed.".format(run_limit))
            break

        time.sleep(0.01)

    os.kill(os.getpid(), 17)  # SIGCHLD
    return duration

Running script in chroot ($1 is the script to be run in the chroot jail, $2 is the jail path)

#!/usr/bin/env bash

# copy script to chroot environment
cp "$1" "$2/prepare.sh"

# run script
chmod u+x "$2/prepare.sh"
echo './prepare.sh' | chroot "$2"
rm "$2/prepare.sh"

Example prepare.sh script

#!/bin/bash
echo asdf > file

I spent some time trying to solve the issue. I found out that this script (which is not using chroot jail to run bash scripts) is working perfectly:

import psutil
import os
import time

while True:
    if os.path.exists("infinite.sh"):
        p = psutil.Popen(["bash","infinite.sh"])
        start = time.time()

        while True:
            if p.status() == psutil.STATUS_ZOMBIE:
                # process ended naturally
                break

            if time.time() > start + 2:
                # process needs too much time and has to be killed
                children = p.children(recursive=True)
                for child in children:
                    child.kill()

                p.kill()
                break

        os.remove("infinite.sh")
        os.kill(os.getpid(), 17)

My questions are:

Thanks for your ideas.

EDIT: I found out that I am sigstopped at the moment I run the first script after I killed an overtime one. No matter if I use os.system or psutil.Popen.

EDIT2: I did even more investigation and the critical line is echo './prepare.sh' | chroot "$2" in the bash script controlling the chroot jail. The question now is, what the hell is wrong with it?

EDIT3: This might be a related problem, if it helps someone.

Upvotes: 0

Views: 2359

Answers (3)

Felix
Felix

Reputation: 301

This thread is a little bit older but I believe I know the cause of your problem (had a similar issue):

From here it says:

Linux supports the standard signals listed below. [...] First the signals described in the original POSIX.1-1990 standard.

  Signal     Value     Action   Comment
   ──────────────────────────────────────────────────────────────────────
   SIGHUP        1       Term    Hangup detected on controlling terminal
                                 or death of controlling process
   SIGINT        2       Term    Interrupt from keyboard
   SIGQUIT       3       Core    Quit from keyboard
   SIGILL        4       Core    Illegal Instruction
   SIGABRT       6       Core    Abort signal from abort(3)
   SIGFPE        8       Core    Floating-point exception
   SIGKILL       9       Term    Kill signal
   SIGSEGV      11       Core    Invalid memory reference
   SIGPIPE      13       Term    Broken pipe: write to pipe with no
                                 readers; see pipe(7)
   SIGALRM      14       Term    Timer signal from alarm(2)
   SIGTERM      15       Term    Termination signal
   SIGUSR1   30,10,16    Term    User-defined signal 1
   SIGUSR2   31,12,17    Term    User-defined signal 2
   SIGCHLD   20,17,18    Ign     Child stopped or terminated
   SIGCONT   19,18,25    Cont    Continue if stopped
   SIGSTOP   17,19,23    Stop    Stop process
   SIGTSTP   18,20,24    Stop    Stop typed at terminal
   SIGTTIN   21,21,26    Stop    Terminal input for background process
   SIGTTOU   22,22,27    Stop    Terminal output for background process

It shows, that a process (per default action) also gets stopped when it receives the SIGTSTP, SIGTTIN, or SIGTTOU signals.

This page explains that:

[SIGTTIN and SIGTTOU] are signals that are sent to background processes that they attempt to read from (SIGTTIN) or write to (SIGTTOU) their controlling terminal (or tty).
...
[...] changing terminal settings [from a background process] does cause SIGTTOU to be sent

I used sudo strace -tt -o [trace_output_file] -p [pid] to see which signal triggered the stopping of my process.

How to solve the Problem? I sadly cannot get your reduced example to work: How does your infinite.sh looks like? Why are you removing it during execution? I suggest redirecting stdin and stdout. Have you tried the following?

from subprocess import DEVNULL
p = psutil.Popen(["bash", mode, script_path, self.TESTENV_ROOT],
                 stdout=DEVNULL, stderr=DEVNULL, STDIN=DEVNULL)

You can of course also use subprocess.PIPE to handle the output in your Python code or simply redirect to a file. I am not sure how to handle unauthorized attempts to modify the tty settings.

Upvotes: 1

karlosss
karlosss

Reputation: 3155

Ok, I finally found the solution. The problem really was on the chroot line in the bash script:

echo './prepare.sh' | chroot "$2"

This appears to be incorrect for some reason. The correct way to run a command in chroot is:

chroot chroot_path shell -c command

So for example:

chroot '/home/chroot_jail' '/bin/sh' -c 'rm -rf /'

Hope this helps someone.

Upvotes: 1

ElmoVanKielmo
ElmoVanKielmo

Reputation: 11316

I'm pretty sure you're running this on Mac OS and not Linux. Why? You're sending signal 17 to your main python process instead of using:

import signal
signal.SIGCHLD

I believe you have a handler for signal 17 which is supposed to respawn the jailed process in response to this signal.
But signal.SIGCHLD == 17 on Linux and signal.SIGCHLD == 20 on Mac OS.

Now the answer for your question is:
signal.SIGSTOP == 17 on Mac OS.
Yes, your process sends SIGSTOP to itself with os.kill(os.getpid(), 17)
Mac OS signal man page

EDIT:
Actually it can also happen on Linux since Linux signal man page says that POSIX standard allows signal 17 to be either SIGUSR2, SIGCHLD or SIGSTOP. Therefore I strongly recommend using constants from signal module of the standard library instead of hardcoded signal numbers.

Upvotes: 2

Related Questions