L. Yao
L. Yao

Reputation: 21

The parent-process only received SIGCHLD from its children once or twice, no mater how many times it fork()

I almost do the same thing as here, to implement a signalprocmask program, in which the parent will process all the SIGCHLD sent by its children.(I also test the code in the link, but the result is the same -- the parent only received the SIGCHLD once or twice no mater how many times it forks)

And the expected result would be:(the number of add is the same as del)

add job 12987
add job 12988
Wed Dec 19 22:20:59 CST 2018
del from 12987
Wed Dec 19 22:21:00 CST 2018
del from 12988
add job 12989
add job 12990
del from 12989
Wed Dec 19 22:21:01 CST 2018
add job 12991
Wed Dec 19 22:21:02 CST 2018
del from 12990
Wed Dec 19 22:21:03 CST 2018
del from 12991

But the result is:(Not all the SIGCHLD would be catched by the parent-process)

add job 12987
add job 12988
Wed Dec 19 22:20:59 CST 2018
del from 12987
now the list is: 12988
Wed Dec 19 22:21:00 CST 2018
del from 12988
now the list is: 
add job 12989
add job 12990
Wed Dec 19 22:21:01 CST 2018
add job 12991
Wed Dec 19 22:21:02 CST 2018
add job 12992
Wed Dec 19 22:21:03 CST 2018
add job 12993
Wed Dec 19 22:21:04 CST 2018
add job 12994
Wed Dec 19 22:21:05 CST 2018
add job 13091
Wed Dec 19 22:21:06 CST 2018
add job 13092
Wed Dec 19 22:21:07 CST 2018
Wed Dec 19 22:21:08 CST 2018 

And here's my code:

#include "apue.h"
#include <sys/wait.h>
#include <sys/signal.h>
#include <errno.h>

void printJobs();
void addJob(int);
void delJob();

void handler(int sig)
{
    sigset_t mask_all, pre_all;
    sigfillset(&mask_all); // fill all bits of the mask
    pid_t pid;
    while ((pid = waitpid(-1, NULL, 0)) > 0) {
        sigprocmask(SIG_BLOCK, &mask_all, &pre_all);
        printf("del from %d\n", pid);
        delJob(pid);
        sigprocmask(SIG_UNBLOCK, &pre_all, NULL);
    }
    if (errno != ECHILD)
        printf("waitpid error\n");
}

int main(int argc, char **argv)
{
    pid_t pid;
    sigset_t mask_all, mask_one, pre_one;

    sigfillset(&mask_all);
    sigemptyset(&mask_one);
    sigaddset(&mask_one, SIGCHLD);
    signal(SIGCHLD, handler);
    for (int i = 0; i < 10; ++i) {
        sigprocmask(SIG_BLOCK, &mask_one, &pre_one); // block SIGCHLD
        if ((pid = fork()) == 0) {
            sigprocmask(SIG_SETMASK, &pre_one, NULL);
            sleep(1);
            execve("/bin/date", argv, NULL);
        }
        sigprocmask(SIG_BLOCK, &mask_all, NULL);     // block all sigals
        addJob(pid);
        sigprocmask(SIG_SETMASK, &pre_one, NULL); // unblock SIGCHLD
        sleep(1);
    }
    exit(0);
}

typedef struct Node {
    int val;
    struct Node *next;
} Node, *pNode;

pNode phead = NULL, ptail = NULL;

void printJobs()
{
    pNode pt = phead;
    while (pt) {
        printf("%d", pt->val);
        pt = pt->next;
    }
    printf("\n");
}

void delJob(int pid)
{
    if (ptail) {
        pNode pt = phead, pre = NULL;
        while (pt && pt->val != pid) {
            pre = pt;
            pt = pt->next;
        }
        if (!pt) {
            printf("No job %d\n", pid);
            return;
        }
        if (pt == phead) { // only have one node or empty
            phead = phead->next ? phead->next : NULL;
            free(pt);
            ptail = phead ? ptail : NULL;
        } else { // have more than one nodes
            printf("del %d\n", pt->val);
            free(pt);
            pre->next = NULL;
            ptail = pt == ptail ? pre : ptail;
        }
        printf("now the list is: ");
        printJobs();
    } else {
        printf("No job %d\n", pid);
    }
}

void addJob(int pid)
{
    printf("add job %d\n", pid);
    pNode pt = malloc(sizeof(Node));
    pt->val = pid;
    pt->next = NULL;
    if (!phead) {
        phead = ptail = pt;
    } else {
        ptail->next = pt;
        ptail = pt;
    }
}

Upvotes: 2

Views: 2501

Answers (1)

John Bollinger
John Bollinger

Reputation: 180058

Your parent process does not wait for its children to finish (via wait() or waitpid()), so nothing ensures that the SIGCHLD signals they emit when they do will be delivered to it while it's still alive. That the signal handler performs a wait does not help here, because that does not prevent the process from terminating while children are still alive. The parent and all the children will probably terminate at about the same time, so it is not very surprising that the parent reports signals only from one or two children, even if we assume that calling printf() from a signal handler will behave as you seem to expect.

Additionally, all the sleep()ing the program does probably confuses the issue more than it clarifies. Certainly it is not the right tool for managing inter-process synchronization and timing.

Consider this heavily-trimmed and modified derivative for comparison:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>
#include <errno.h>

volatile sig_atomic_t signal_count;

void handler(int sig) {
    signal_count += 1;
}

int main(int argc, char *argv[]) {
    pid_t pid;

    signal_count = 0;
    signal(SIGCHLD, handler);
    for (int i = 0; i < 10; ++i) {
        if ((pid = fork()) == 0) {
            sleep(1);
            execve("/bin/date", argv, NULL);
        }
        sleep(1);
    }

    // wait for the children to terminate
    while (wait(NULL) != -1) { /* empty */ }

    printf("Process %d handled %d SIGCHLD signals\n", (int) getpid(), (int) signal_count);
    exit(0);
}

My test run of that program produced this output:

Wed Dec 19 09:56:08 CST 2018
Wed Dec 19 09:56:09 CST 2018
Wed Dec 19 09:56:09 CST 2018
Wed Dec 19 09:56:10 CST 2018
Wed Dec 19 09:56:10 CST 2018
Wed Dec 19 09:56:11 CST 2018
Wed Dec 19 09:56:11 CST 2018
Wed Dec 19 09:56:11 CST 2018
Wed Dec 19 09:56:12 CST 2018
Wed Dec 19 09:56:12 CST 2018
Process 2169 handled 10 SIGCHLD signals

Note especially the last line. It confirms that all 10 expected signals were handled by the original parent process.

Addendum

As @zwol observed in comments, to the extent that the parent can rely on blocking in wait() or waitpid() to collect its children, it does not need to register a handler for SIGCHLD at all. It can instead perform whatever work is needed each time wait() returns a non-error code. Using a signal handler to collect children serves that opposite case, that you want to avoid blocking the parent process, or figuring out where to try to collect children in a non-blocking manner.

Nevertheless, it may be the case that although you don't normally want to block to collect children, the parent wants to ensure at some point -- perhaps as it's preparing to terminate -- that it collects all remaining children. It might make sense in such a case to use both approaches to collecting children in the same program.

Addendum 2

With thanks to @NominalAnimal, I observe, too, that the usual implementations of signal handling do not queue multiple non-realtime signals of the same type to the same thread at the same time. If a signal is delivered to a thread that already has that signal of that type pending on it then the new signal produces no additional effect. For that reason, although I demonstrate receipt of a separate SIGCHLD for each child, I'm not guaranteed to see more than one, as the second through tenth could be delivered while the first was still pending. Keeping signal handler implementations short reduces the likelihood of "losing" signals that way, but cannot eliminate it.

Note well, however, that these particular signal-handling details do not prevent wait() and waitpid() from collecting the process's terminated children.

Upvotes: 3

Related Questions