Dragon
Dragon

Reputation: 303

Why does bash "forget" about my background processes?

I have this code:

#!/bin/bash
pids=()
for i in $(seq 1 999); do
  sleep 1 &
  pids+=( "$!" )
done
for pid in "${pids[@]}"; do
  wait "$pid"
done

I expect the following behavior:

Instead, I get this error:

./foo.sh: line 8: wait: pid 24752 is not a child of this shell

(repeated 171 times with different pids)

If I run the script with shorter loop (50 instead of 999), then I get no errors.

What's going on?

Edit: I am using GNU bash 4.4.23 on Windows.

Upvotes: 2

Views: 209

Answers (3)

oguz ismail
oguz ismail

Reputation: 50750

POSIX says:

The implementation need not retain more than the {CHILD_MAX} most recent entries in its list of known process IDs in the current shell execution environment.

{CHILD_MAX} here refers to the maximum number of simultaneous processes allowed per user. You can get the value of this limit using the getconf utility:

$ getconf CHILD_MAX
13195

Bash stores the statuses of at most twice as that many exited background processes in a circular buffer, and says not a child of this shell when you call wait on the PID of an old one that's been overwritten. You can see how it's implemented here.

Upvotes: 4

KamilCuk
KamilCuk

Reputation: 140880

I can reproduce on ArchLinux with docker run -ti --rm bash:5.0.18 bash -c 'pids=; for ((i=1;i<550;++i)); do true & pids+=" $!"; done; wait $pids' and any earlier. I can't reproduce with bash:5.1.0 .

What's going on?

It looks like a bug in your version of Bash. There were a couple of improvements in jobs.c and wait.def in Bash:5.1 and Make sure SIGCHLD is blocked in all cases where waitchld() is not called from a signal handler is mentioned in the changelog. From the look of it, it looks like an issue with handling a SIGCHLD signal while already handling another SIGCHLD signal.

Upvotes: 1

tjm3772
tjm3772

Reputation: 3144

The way you might reasonably expect this to work, as it would if you wrote a similar program in most other languages, is:

  1. sleep is executed in the background via a fork+exec.
  2. At some point, sleep exits leaving behind a zombie.
  3. That zombie remains in place, holding its PID, until its parent calls wait to retrieve its exit code.

However, shells such as bash actually do this a little differently. They proactively reap their zombie children and store their exit codes in memory so that they can deallocate the system resources those processes were using. Then when you wait the shell just hands you whatever value is stored in memory, but the zombie could be long gone by then.

Now, because all of these exit statuses are being stored in memory, there is a practical limit to how many background processes can exit without you calling wait before you've filled up all the memory you have available for this in the shell. I expect that you're hitting this limit somewhere in the several hundreds of processes in your environment, while other users manage to make it into the several thousands in theirs. Regardless, the outcome is the same - eventually there's nowhere to store information about your children and so that information is lost.

Upvotes: 3

Related Questions