Peter P.
Peter P.

Reputation: 3517

How to properly wait for bash child process to complete when trapping signals

We have a wrapper script which launches DelayedJob worker in background. This script waits until the DelayedJob worker completes before exiting. The wrapper script is the main entry point to a Docker container and sets up some environment needed for the DJ worker to run.

We are noticing though that when issuing Docker stop, the Docker container should wait until the DJ worker exits gracefully (or until max timeout) but this is not occurring. The container exits immediately.

Issuing Docker stop call to the container sends a SIGTERM to the main process, the wrapper script. In the wrapper script, we trap the SIGTERM and pass the signal to the DJ worker process.

This still doesn't work. I've created a test case using simple Bash scripts which illustrates the problem.

Script p1:

#!/bin/bash
echo "P1: starting p1 and running p2 in bg"
exit_script() {
  echo "P1: Caught sigterm in p1, sending TERM to p2"
  kill -TERM $child
}

trap exit_script SIGINT SIGTERM

./p2 &
child=$!

echo "P1: waiting for p2 ($child)"
wait $child

echo "P1: Finished waiting for p2, exiting p1"

Script p2:

#!/bin/bash
echo "P2: starting p2"
exit_script() {
  echo "P2: Caught sigterm"
  NEXT_WAIT_TIME=0
  until [ $NEXT_WAIT_TIME -eq 10 ]; do
    echo "P2: EXIT_SCRIPT loop $NEXT_WAIT_TIME"
    sleep $(( NEXT_WAIT_TIME++ ))
  done  
  exit
}

trap exit_script SIGINT SIGTERM

echo "P2: Sleeping for a while"

NEXT_WAIT_TIME=0
until [ $NEXT_WAIT_TIME -eq 10 ]; do
  echo "P2: Main Loop $NEXT_WAIT_TIME"
  sleep $(( NEXT_WAIT_TIME++ ))
done

echo "P2: Finished sleeping in p2"

Output:

MBP:$ ./p1
P1: starting p1 and running p2 in bg
P1: waiting for p2 (74039)
P2: starting p2
P2: Sleeping for a while
P2: Main Loop 0
P2: Main Loop 1
P2: Main Loop 2
P2: Main Loop 3
P2: Main Loop 4
P1: Caught sigterm in p1, sending TERM to p2
P1: Finished waiting for p2, exiting p1
MBP:$ P2: Caught sigterm
P2: EXIT_SCRIPT loop 0
P2: EXIT_SCRIPT loop 1
P2: EXIT_SCRIPT loop 2
P2: EXIT_SCRIPT loop 3
P2: EXIT_SCRIPT loop 4
P2: EXIT_SCRIPT loop 5
P2: EXIT_SCRIPT loop 6
P2: EXIT_SCRIPT loop 7
P2: EXIT_SCRIPT loop 8
P2: EXIT_SCRIPT loop 9

As you can see, line after the p1 scripts call to wait is executed PRIOR to the code in the exit_script function which is called when trapping the signal.

A solution is to replace wait with a timeout loop that checks the existence of the child PID, but why doesn't wait work as expected? Is the usage of wait incorrect?

Upvotes: 2

Views: 1237

Answers (1)

estabroo
estabroo

Reputation: 189

The wait is interrupted by the incoming signal and does not get restarted. You should be able to just add another wait call to force it to finish waiting. There is probably a better way to do that though.

echo "P1: waiting for p2 ($child)"
wait $child
wait $child

echo "P1: Finished waiting for p2, exiting p1"

Upvotes: 3

Related Questions