Reputation: 10393

Add a big buffer to a pipe between two commands

Given a bash command line of the form

commandA | commandB

I want to add a buffer of size ~1MB that sits between commandA and commandB. I would expect to be able to do this with something of the form

commandA | BUFFER | commandB

but what is the command to use for BUFFER?

Remark: I want to do this in order to decouple the two commands to make them parallelize better. The problem is that commandB processes data in large chunks, which currently means that commandA blocks until commandB is done with a chunk. So everything runs sequentially :-(

Upvotes: 36

Answers (7)

ArturZ

Reputation: 41

Instead of using an intermediate command in the pipe, you can increase the pipe capacity using pipesz command (available since util-linux-2.39):

commandA | pipesz -i -s 1M commandB

pipesz -o -s 1M commandA | commandB

or you can use a Python script to do the same:

commandA | python -c 'import sys, os, fcntl; fcntl.fcntl(0, fcntl.F_SETPIPE_SZ, 1024*1024); os.execvp(sys.argv[1], sys.argv[1:])' commandB

F_SETPIPE_SZ constant is available since Python 3.10. For older versions replace it with 1031.

Maximum pipe capacity for non-root users is limited by the value in /proc/sys/fs/pipe-max-size (1 MiB by default).

A good example of the problem is compression from a slow input or to a slow output with bzip2 (data is compressed with 900 kiB block size).

In contrast, stdbuf comand can set FILE *stdout buffer size (using LD_PRELOAD and setvbuf()) and will not help with pipe blocking.

Upvotes: 0

moritz

Reputation: 12842

There is a tool called stdbuf that lets you specify the buffer size of the pipe, something like:

stdbuf -o 1M commandA | commandB

Upvotes: 0

Johannes Gerer

Reputation: 25818

There is another tool, pv - pipe viewer:

process1 | pv -pterbTCB 1G | process2

B specifies the buffer size, here 1 Gigibyte
C disables splice, which is required for B
T shows the buffer level
pterb are the default display switches needed due to the presence of T

pv might be available on systems where mbuffer/buffer is not in the official repositories (such as arch linux).

Upvotes: 28

ceving

Reputation: 23856

The program buffer uses shared memory. This might be a problem, because in case of an error, memory may leak, because shared memory can outlive the program, which allocated the memory.

An alternative may be GNU dd:

commandA |
dd status=none iflag=fullblock bs=1M |
commandB

It is important to use the fullblock option. Otherwise dd may cause data loss, when reading from a pipe.

Parameters of dd explained

status=none

Set the level of information to print to stderr; 'none' suppresses everything but error messages
iflag=fullblock

accumulate full blocks of input
bs=1M

read and write up to one Mega bytes at a time (default: 512 bytes);

Upvotes: 5

Samus_

Reputation: 2993

alternatively you could use a named pipe and run them in parallel:

mkfifo myfifo
commandB < myfifo &
commandA > myfifo
rm myfifo

Upvotes: -4

sehe

Reputation: 393114

You can use

buffer (mentioned)
mbuffer (works on solaris too, possibly other UNIXes)

E.g.

    process1 | mbuffer -m 1024M | process2

to use a 1G buffer

Upvotes: 8

Eugen Rieck

Reputation: 65284

BUFFER is called buffer. (man 1 buffer, maybe after apt-get install buffer)

Upvotes: 30

Add a big buffer to a pipe between two commands

Answers (7)

Related Questions