bash pipe vs here-string

Question

I tought these command were equivalent in bash, but they are producing different outputs. Could you help me understand why?

$ echo "SEBA" | wc
      1       1       5

$ wc <<< "SEBA"
1 1 5

Running on

Ubuntu 20.04.2 LTS
GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)
wc (GNU coreutils) 8.30

here are some tests:

$ echo "SEBA" | wc | hexdump 
0000000 2020 2020 2020 2031 2020 2020 2020 2031
0000010 2020 2020 2020 0a35                    
0000018

$ wc <<< "SEBA" | hexdump 
0000000 2031 2031 0a35                         
0000006

$ echo "SEBA" | hexdump 
0000000 4553 4142 000a                         
0000005

$ hexdump <<< "SEBA"
0000000 4553 4142 000a                         
0000005

Barmar · Accepted Answer

When GNU wc gets all its input from files, it uses stat() (or fstat() for stdin) to get the sizes of the all the files in characters. From this it can determine the maximum number digits needed for each output field, and only uses that many digits.

When any of the inputs is a pipe, it's not possible to determine its size ahead of time. It defaults to 7 digits for that input.

Here-strings are implemented by copying the string to a temporary file and redirecting stdin to that file, so this case is able to use the optimized field size. But piping from echo doesn't permit this, so it gets 7-digit fields.

See the functions get_input_fstatus and compute_number_width in the GNU coreutils source.

As noted in a comment, bash 5.1 doesn't use a temporary for small here-strings or here-documents, it uses a pipe. "Small" may not be very small, it's the pipe buffer size. As explained at How big is the pipe buffer?, this defaults to 16K on Mac OS X and 64K on Linux. So you shouldn't depend on this behavior portably between bash versions.

bash pipe vs here-string

Answers (1)

Related Questions