Hoffs
Hoffs

Reputation: 127

Killing process started by bash script but not script itself

So basically I have one script that is keeping a server alive. It starts the server process and then starts it again after the process stops. Although sometimes the server becomes non responsive. For that I want to have another script which would ping the server and would kill the process if it wouldn't respond in 60 seconds.

The problem is that if I kill the server process the bash script also gets terminated.

The start script is just while do: sh Server.sh. It calls other shell script that has additional parameters for starting the server. The server is using java so it starts a java process. If the server hangs I use kill -9 pid because nothing else stops it. If the server doesn't hang and does the usual restart it gracefully stops and the bash script start second loop.

Upvotes: 0

Views: 1546

Answers (1)

Charles Duffy
Charles Duffy

Reputation: 295403

Doing The Right Thing

Use a real process supervision system -- your Linux distribution almost certainly includes one.

Directly monitoring the supervised process by PID

An awful, ugly, moderately buggy approach (for instance, able to kill the wrong process in the event of a PID collision) is the following:

while :; do
  ./Server.sh & server_pid=$!
  echo "$server_pid" > server.pid
  wait "$server_pid"
done

...and, to kill the process:

#!/bin/bash
#      ^^^^ - DO NOT run this with "sh scriptname"; it must be "bash scriptname".

server_pid="$(<server.pid)"; [[ $server_pid ]] || exit
# allow 5 seconds for clean shutdown -- adjust to taste
for (( i=0; i<5; i++ )); do
  if kill -0 "$server_pid"; then
    sleep 1
  else
    exit 0 # server exited gracefully, nothing else to do
  fi
done

# escalate to a SIGKILL
kill -9 "$server_pid"

Note that we're storing the PID of the server in our pidfile, and killing that directly -- thus, avoiding inadvertently targeting the supervision script.


Monitoring the supervised process and all children via lockfile

Note that this is using some Linux-specific tools -- but you do have on your question.

A more robust approach -- which will work across reboots even in the case of pidfile reuse -- is to use a lockfile:

while :; do
  flock -x Server.lock sh Server.sh
done

...and, on the other end:

#!/bin/bash

# kill all programs having a handle on Server.lock
fuser -k Server.lock
for ((i=0; i<5; i++)); do
  if fuser -s Server.lock; then
    sleep 1
  else
    exit 0
  fi
done
fuser -k -KILL Server.lock

Upvotes: 3

Related Questions