emesday
emesday

Reputation: 6186

How to zip stdin along with stdout line by line

I have a simple command (my_cc) that computes the number of characters in each line.

This command yields 5, 6, 7, and 8 for text file respectively.

$ cat text
12345
123456
1234567
12345678

$ cat text | ./my_cc 
5
6
7
8

My question is how to zip stdin along with stdout line by line like (without multiple processes):

$ cat text | some_magic_command with my_cc
12345 5
123456 6
1234567 7
12345678 8

A possible answer is:

$ cat text | xargs -I {} bash -c "echo {} | ./my_cc | sed 's/^/{} /g'"
12345 5
123456 6
1234567 7
12345678 8

But this invokes processes of my_cc as the number of lines in text.

I can not use this command because my_cc is too heavy to run for each line. Also I can not modify the my_cc.

Upvotes: 1

Views: 111

Answers (2)

melpomene
melpomene

Reputation: 85767

If

  1. my_cc doesn't buffer its output, but writes a line of output immediately after receiving each line of input (most commands don't do that), and
  2. your text doesn't come from a file but is e.g. generated from another command on the fly,

you can do the following:

my_cc() {
    perl -nle 'BEGIN { $| = 1 } print length'
}

coproc my_cc
while read -r; do
    printf '%s ' "$REPLY"
    printf '%s\n' "$REPLY" >&${COPROC[1]}
    read -r <&${COPROC[0]}
    printf '%s\n' "$REPLY"
done < <( echo '12345
123456
  .  
1234567
12345678' )

exec {COPROC[0]}<&- {COPROC[1]}>&-
wait $COPROC_PID

Output:

12345 5
123456 6
  .   5
1234567 7
12345678 8

Note:

Condition #1 is essential. If my_cc buffers its output, this code will deadlock.

Condition #2 is not strictly required. You could easily run this code on a file (while read -r; do ... done < sometextfile), but a file can be read multiple times, so simpler solutions (that don't require condition #1) are possible.

Explanation:

  • my_cc is defined as a shell function to stand in for your actual command. It does what you described (prints the length of each line), but $| = 1 deserves comment: This statement enables autoflush mode on the currently selected output handle (which defaults to stdout), i.e. output is written immediately after each print command.

  • coproc is a bash built-in command that runs the specified command in the background (as a co-process).

  • The while read -r loop reads input line by line from another command (here played by echo '...').

  • Each line read ($REPLY) is first printed followed by a space, then sent to the coprocess.

  • Then we read a single line of output from the coprocess and print it followed by a newline.

  • At the end we close the file descriptors of our coprocess and wait for it to terminate.

Upvotes: 1

Benjamin W.
Benjamin W.

Reputation: 52132

You can use paste:

paste -d ' ' text <(./my_cc < text)

This puts a space between each line of text and the output of your command.

If you have a shell that doesn't support process substitution, you can read from standard input instead:

./my_cc < text | paste -d ' ' text -

Upvotes: 3

Related Questions