Reputation: 99428
How is backgrounding a process (for example, in Bash) implemented in terms of Linux system calls?
The purpose of my question is that I don't understand why bash manual says
asynchronous commands are invoked in a subshell environment,
(if I am correct, "aynchronous commands" means running the commands in background),
while, by using strace
, I found that the parent shell process first calls clone()
to create a subshell which is a copy of itself, and then the subshell calls a execve()
to replace the subshell itself with the command to run in background.
This is just like running a foreground process. I don't see the command is invoked in the subshell.
If I am correct, invoking a command in subshell means the subshell invokes clone()
to create a subsubshell, and then the subsubshell invokes execve()
to replace the subsubshell itself with the command to run in background. But in reality, the subshell doesn't call clone()
.
For example,
In Ubuntu, I run date
in an interactive bash shell whose pid is 6913, and at the same time, trace the bash shell from another interactive bash shell by strace
.
When running date
, the output of tracing the first shell 6913 in the second shell is :
$ sudo strace -f -e trace=process -p 6913
[sudo] password for t:
Process 6913 attached
clone(Process 12918 attached
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f457c05ca10) = 12918
[pid 6913] wait4(-1, <unfinished ...>
[pid 12918] execve("/bin/date", ["date"], [/* 66 vars */]) = 0
[pid 12918] arch_prctl(ARCH_SET_FS, 0x7ff00c632740) = 0
[pid 12918] exit_group(0) = ?
[pid 12918] +++ exited with 0 +++
<... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WSTOPPED|WCONTINUED, NULL) = 12918
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=12918, si_status=0, si_utime=0, si_stime=0} ---
wait4(-1, 0x7ffea6781518, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child processes)
When running date &
, the output of tracing the first shell 6913 in the second shell is :
$ sudo strace -f -e trace=process -p 6913
Process 6913 attached
clone(Process 12931 attached
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f457c05ca10) = 12931
[pid 12931] execve("/bin/date", ["date"], [/* 66 vars */]) = 0
[pid 12931] arch_prctl(ARCH_SET_FS, 0x7f530c5ee740) = 0
[pid 12931] exit_group(0) = ?
[pid 12931] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=12931, si_status=0, si_utime=0, si_stime=0} ---
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED|WCONTINUED, NULL) = 12931
wait4(-1, 0x7ffea6780718, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child processes)
Upvotes: 4
Views: 1370
Reputation: 123470
the subshell doesn't call clone()
This is an explicit optimization. bash realizes that there is no point in doing another costly fork if the subshell's only job is to execute a single external command synchronously. From execute_cmd.c:
/* If this is a simple command, tell execute_disk_command that it
might be able to get away without forking and simply exec.
This means things like ( sleep 10 ) will only cause one fork.
If we're timing the command or inverting its return value, however,
we cannot do this optimization. */
The only real syscall difference between date
and date &
is therefore that the latter isn't followed by a wait
.
The more interesting case that the bash manual alludes to is when you run non-external commands. The difference between let i++
and let i++ &
is that the former is evaluated by the bash itself, while the latter is evaluated by a subshell which then exits, and therefore has no effect.
Upvotes: 1
Reputation: 58578
The text "asynchronous commands are invoked in a subshell environment" doesn't specifically refer to backgrounding, because backgrounding is an interactive concept from "POSIX job control". Whereas asynchronous commands occur interactively or non-interactively.
"Invoked in a subshell environment" is just shell terminology which means that a fork
takes place, and these commands run in a child process which cannot modify variables in the parent and other state:
$ VAR=value &
[1] 15479
[1]+ Done VAR=value
$ echo $VAR
$
Because the variable assignment is run in a subshell, it has no effect in the parent shell.
How backgrounding works in terms of system calls is that POSIX job control revolves around a set of processes which are organized into process groups, which belong to a session, which is attached to a controlling terminal. Only one group at a time in a session is the foreground process group.
The reason the design revolves around groups rather than individual processes is because jobs consist of multiple processes when piping is used. For instance, sort -u file | grep foo
creates a process group with two processes.
The shell itself is also in a process group. When the shell is prompting you for input, that process group is in the foreground. When the shell executes a job in the foreground, it places itself into the background with some special job control system calls.
When you send a Ctrl-Z to the TTY, it generates a SIGTSTP
signal to every process in the foreground process group. The shell detects this change in child status (via waitpid
or some such) and then shuffles that group to the background, putting itself to the foreground again, to receive TTY input.
When you're talking to the shell, all jobs are in the background, whether they are running or not: the shell is the foreground process group, so all the other jobs are not. With the bg
command, you are just changing the running state of a suspended background job. Ctrl-Z sent a SIGTSTP
which suspended it, and the shell moved it to the background. bg
will resume it, allowing it to execute. However, if a background job is resumed and tries to obtain input from the TTY, it gets a SIGTTIN
signal and is suspended again:
$ cat &
[1] 12620
$ bg
[1]+ cat &
$ # hit Enter
[1]+ Stopped cat
$ bg
[1]+ cat &
$ # hit Enter
[1]+ Stopped cat
cat
wants to read from the tty, so when we put it in the background, it gets a SIGTTIN
from the kernel's TTY subsystem which stops it. The shell detects this signal-driven change in status similarly to Ctrl-Z/SIGTSTP
and prints a message telling us that the job stopped. Every time we bg
it (resume it) the same thing happens. The shell dispatches cat
(probably with a SIGCONT
), and cat
immediately resumes trying to get input from the TTY which responds with "you're not a member of the foreground process group, bad kitty: SIGTTIN
for you". All this whole time, cat
never leaves the background.
Upvotes: 1