gphilip
gphilip

Reputation: 706

Killing child processes from parent's signal handler hangs

I have a problem in topic of posix processes and I can't get around.

I have a process which forks several children (the process tree can be complex, not only one level). It also keeps track of the active childrens' PID. At some point the parent receives a signal (SIGINT, let's say).

In the signal handler for SIGINT, it iterates over the list of child processes and sends the same signal to them in order to prevent zombies. Now, the problem is that

  1. if the parent does not waitpid() for the child to be stopped, the signal seems to be never dispatched (zombies keep running)
  2. if the parent waits after every kill() sent to a child, it simply hangs there and the child seems to ignore the signal

Parent and children have the same signal handler, as it's installed before forking. Here is a pseudocode.

signal_handler( signal )
    foreach child in children
        kill( child, signal )
        waitpid( child, status )

    // Releasing system resources, etc.
    clean_up()

    // Restore signal handlers.
    set_signal_handlers_to_default()

    // Send back the expected "I exited after XY signal" to the parent by
    // executing the default signal handler again.
    kill( getpid(), signal )

With this implementation the execution stops on the waitpid. If I remove the waitpid, the children keep running.

My guess is that unless a signal handler has ended, the signals sent from it are not dispatched to the children. But why aren't they dispatched if I omit wait?

Thanks a lot in advance!

Upvotes: 2

Views: 7430

Answers (1)

chill
chill

Reputation: 16898

What you describe should work and indeed it does, with the following testcase:

#include <stdio.h>
#include <unistd.h>
#include <signal.h>

#define NCHILDREN 3
pid_t child [NCHILDREN];

struct sigaction sa, old;

static void
handler (int ignore)
{
  int i;

  /* Kill the children.  */
  for (i = 0; i < NCHILDREN; ++i)
    {
      if (child [i] > 0)
        {
          kill (child [i], SIGUSR1);
          waitpid (child [i], 0, 0);
        }
    }

  /* Restore the default handler.  */
  sigaction (SIGUSR1, &old, 0);

  /* Kill self.  */
  kill (getpid (), SIGUSR1);
}

int
main ()
{
  int i;

  /* Install the signal handler.  */
  sa.sa_handler = handler;
  sigemptyset (&sa.sa_mask);
  sa.sa_flags = 0;
  sigaction (SIGUSR1, &sa, &old);

  /* Spawn the children.  */
  for (i = 0; i < NCHILDREN; ++i)
    {
      if ((child [i] = fork ()) == 0)
        {
          /* Each of the children: clear the array, wait for a signal
             and exit.  */
          while (i >= 0)
            child [i--] = -1;
          pause ();
          return 0;
        }
    }

  /* Wait to be interrupted by a signal.  */
  pause ();
  return 0;
}

If you see the parent hanging in waitpid, it means the child has not exited. Try to attach with a debugger to see where the child is blocked, or, easier, run the program with strace(1). How do you clean up your pid array? Make sure the children are not trying call waitpid with pid parameter being <= 0. Make sure the children are not blocking or ignoring the signal.

Upvotes: 6

Related Questions