Reputation: 2020
I came across an example for the using tee
utility in the tee
info page:
wget -O - http://example.com/dvd.iso | tee >(sha1sum > dvd.sha1) > dvd.iso
I looked up the >(...)
syntax and found something called "process substitution". From what I understand, it makes a process look like a file that another process could write/append its output to. (Please correct me if I'm wrong on that point.)
How is this different from a pipe? (|
) I see a pipe is being used in the above example—is it just a precedence issue? or is there some other difference?
Upvotes: 11
Views: 3504
Reputation: 2955
One major difference is the propagation of return values / exit codes (I'll use simpler commands to illustrate):
Pipe:
$ ls -l /notthere | tee listing.txt
ls: cannot access '/notthere': No such file or directory
$ echo $?
0
-> exit code of tee
is propagated
Process substitution:
$ ls -l /notthere > >(tee listing.txt)
ls: cannot access '/notthere': No such file or directory
$ echo $?
2
-> exit code of ls
is propagated
There are of course several methods to work around this (e.g. set -o pipefail
, variable PIPESTATUS
), but I think it's worth mentioning since this is the default behavior.
Yet another rather subtle, yet potentially annoying difference lies in subprocess termination (best illustrated using commands that produce lots of output):
Pipe:
#!/usr/bin/env bash
tar --create --file /tmp/etc-backup.tar --verbose --directory /etc . 2>&1 | tee /tmp/etc-backup.log
retval=${PIPESTATUS[0]}
(( ${retval} == 0 )) && echo -e "\n*** SUCCESS ***\n" || echo -e "\n*** FAILURE (EXIT CODE: ${retval}) ***\n"
-> after the line containing the pipe construct, all commands of the pipe have already terminated (otherwise PIPESTATUS
could not contain their respective exit codes)
Process substitution:
#!/usr/bin/env bash
tar --create --file /tmp/etc-backup.tar --verbose --directory /etc . &> >(tee /tmp/etc-backup.log)
retval=$?
(( ${retval} == 0 )) && echo -e "\n*** SUCCESS ***\n" || echo -e "\n*** FAILURE (EXIT CODE: ${retval}) ***\n"
-> after the line containing the process substitution, the command within >(...)
, i.e. tee
in this example, may still be running, potentially causing desynchronized console output (SUCCESS / FAILURE message gets mixed in with still flowing tar
output) [*]
[*] Can be reproduced on the framebuffer console, but does not seem to affect GUI terminals like KDE's Konsole (likely due to different buffering strategies).
Upvotes: 5
Reputation: 123600
There's no benefit here, as the line could equally well have been written like this:
wget -O - http://example.com/dvd.iso | tee dvd.iso | sha1sum > dvd.sha1
The differences start to appear when you need to pipe to/from multiple programs, because these can't be expressed purely with |
. Feel free to try:
# Calculate 2+ checksums while also writing the file
wget -O - http://example.com/dvd.iso | tee >(sha1sum > dvd.sha1) >(md5sum > dvd.md5) > dvd.iso
# Accept input from two 'sort' processes at the same time
comm -12 <(sort file1) <(sort file2)
They're also useful in certain cases where you for any reason can't or don't want to use pipelines:
# Start logging all error messages to file as well as disk
# Pipes don't work because bash doesn't support it in this context
exec 2> >(tee log.txt)
ls doesntexist
# Sum a column of numbers
# Pipes don't work because they create a subshell
sum=0
while IFS= read -r num; do (( sum+=num )); done < <(curl http://example.com/list.txt)
echo "$sum"
# apt-get something with a generated config file
# Pipes don't work because we want stdin available for user input
apt-get install -c <(sed -e "s/%USER%/$USER/g" template.conf) mysql-server
Upvotes: 14