What happens to wait() if multiple child process terminate simultaneously?

Question

Say I create a process with multiple child process and I call wait() in the main process. If one child terminates, its pid is returned. But what happens if a couple of child process terminate simultaneously? The call should return with one of them, and a second call should return with the other, right? Is there a particular order in which they will return (maybe there is a precedence to the child with lower pid)?

Damon · Accepted Answer

No.

SUSv4 leaves explicitly unspecified in which order (if any) child processes are reaped by one or several wait calls. There is also no "accidential" order that you could rely on, since different Linux kernel versions perform differently. (Source: M. Kerrisk, TLPI, 26.1.1, page 542).

Somewhat related trivia:
You might wonder why you can reliably wait on several child processes that terminate concurrently at all. If you think about how signals work, you might be inclined to believe that it is perfectly possible to lose child termination signals. Signals, a well-known fact, are not queued (except for realtime signals, but SIGCHLD isn't one!). Which means that if you go strictly by the letter of the book, then clearly several children terminating could cause some child termination signals becoming lost!
You can only call wait once at the same time, so you can at most consume one signal synchronously as it is generated, and have a second one made pending before your next call to wait. It appears that there is no way to account for any other signals that are generated in the mean time.

Luckily, that isn't the case. Waiting on child processes demonstrably works fine. Always, 100%. But why?

The reason is simple, the kernel is "cheating". If a child process exits while you are blocked in wait, there is no question as to what's happening. The signal is immediately delivered, the status information is being filled in, the parent unblocks, and the child process goes * poof *.
On the other hand, if a child is exiting and the parent isn't in a wait call (or if the parent is in a wait call reaping another child), the system converts the exiting process to a "zombie".
Now, when the parent process performs a wait, and there are any zombie processes, it won't block waiting for a signal to be generated (which may never happen if all the children have already exited!). Instead, it will just reap one of the zombies, pretending the signal was delivered just now.

What happens to wait() if multiple child process terminate simultaneously?

Answers (1)

Related Questions