Python script checking if a particular Linux command is still running

Question

I want to write a Python script which will check every minute if some pre-defined process is still running on Linux machine and if it doesn't print a timestamp at what time it has crashed. I have written a script which is doing exactly that but unfortunately, it works correctly with only one process.

This is my code:

import subprocess
import shlex
import time
from datetime import datetime

proc_def = "top"
grep_cmd = "pgrep -a " + proc_def
try:
    proc_run = subprocess.check_output(shlex.split(grep_cmd)).decode('utf-8')
    proc_run = proc_run.strip().split('
')
    '''
    Creating a dictionary with key the PID of the process and value
    the command line
    '''
    proc_dict = dict(zip([i.split(' ', 1)[0] for i in proc_run],
                         [i.split(' ', 1)[1] for i in proc_run]))

    check_run = "ps -o pid= -p "

    for key, value in proc_dict.items():
        check_run_cmd = check_run + key
        try:
            # While the output of check_run_cmd isn't empty line do
            while subprocess.check_output(
                                          shlex.split(check_run_cmd)
                                          ).decode('utf-8').strip():
                # This print statement is for debugging purposes only
                print("Running")
                time.sleep(3)
        '''
        If the check_run_cmd is returning an error, it shows us the time
        and date of the crash as well as the PID and the command line
        '''
        except subprocess.CalledProcessError as e:
            print(f"PID: {key} of command: "{value}" stopped
                  at {datetime.now().strftime('%d-%m-%Y %T')}")
            exit(1)
# Check if the proc_def is actually running on the machine
except subprocess.CalledProcessError as e:
    print(f"The "{proc_def}" command isn't running on this machine")

For example, if there are two top processes it will show me information about the crash time of only one of these processes and it will exit. I want to stay active as long as there is another process running and exit only if both processes are killed. It should present information when each of the processes has crashed.

It shall also not be limited to two proc only and support multiple processes started with the same proc_def command.

kabanus · Accepted Answer

Have to change the logic a bit, but basically you want an infinite loop alternating a check on all processes - not checking the same one over and over:

import subprocess
import shlex
import time
from datetime import datetime

proc_def = "top"
grep_cmd = "pgrep -a " + proc_def
try:
    proc_run = subprocess.check_output(shlex.split(grep_cmd)).decode('utf-8')
    proc_run = proc_run.strip().split('
')
    '''
    Creating a dictionary with key the PID of the process and value
    the command line
    '''
    proc_dict = dict(zip([i.split(' ', 1)[0] for i in proc_run],
                         [i.split(' ', 1)[1] for i in proc_run]))

    check_run = "ps -o pid= -p "
    while proc_dict:
        for key, value in proc_dict.items():
            check_run_cmd = check_run + key
            try:
                # While the output of check_run_cmd isn't empty line do
                subprocess.check_output(shlex.split(check_run_cmd)).decode('utf-8').strip()
                # This print statement is for debugging purposes only
                print("Running")
                time.sleep(3)
            except subprocess.CalledProcessError as e:
                print(f"PID: {key} of command: "{value}" stopped at {datetime.now().strftime('%d-%m-%Y %T')}")
                del proc_dict[key]
                break
# Check if the proc_def is actually running on the machine
except subprocess.CalledProcessError as e:
    print(f"The "{proc_def}" command isn't running on this machine")

This suffers from the same problems in the original code, namely the time resolution is 3 seconds, and if a new process is run during this script, you won't ping it (though this may be desired).

The first problem would be fixed by sleeping for less time, depending on what you need, the second by running the initial lines creating proc_dict in the while True.

Python script checking if a particular Linux command is still running

Answers (1)

Related Questions