Reputation: 10393
Given a bash command line of the form
commandA | commandB
I want to add a buffer of size ~1MB that sits between commandA
and commandB
.
I would expect to be able to do this with something of the form
commandA | BUFFER | commandB
but what is the command to use for BUFFER
?
Remark: I want to do this in order to decouple the two commands to make them parallelize better. The problem is that commandB
processes data in large chunks, which currently means that commandA
blocks until commandB
is done with a chunk. So everything runs sequentially :-(
Upvotes: 36
Views: 19402
Reputation: 41
Instead of using an intermediate command in the pipe, you can increase the pipe capacity using pipesz command (available since util-linux-2.39):
commandA | pipesz -i -s 1M commandB
or
pipesz -o -s 1M commandA | commandB
or you can use a Python script to do the same:
commandA | python -c 'import sys, os, fcntl; fcntl.fcntl(0, fcntl.F_SETPIPE_SZ, 1024*1024); os.execvp(sys.argv[1], sys.argv[1:])' commandB
F_SETPIPE_SZ
constant is available since Python 3.10. For older versions replace it with 1031
.
Maximum pipe capacity for non-root users is limited by the value in /proc/sys/fs/pipe-max-size
(1 MiB by default).
A good example of the problem is compression from a slow input or to a slow output with bzip2
(data is compressed with 900 kiB block size).
In contrast, stdbuf
comand can set FILE *stdout
buffer size (using LD_PRELOAD
and setvbuf()
) and will not help with pipe blocking.
Upvotes: 0
Reputation: 12842
There is a tool called stdbuf
that lets you specify the buffer size of the pipe, something like:
stdbuf -o 1M commandA | commandB
Upvotes: 0
Reputation: 25818
There is another tool, pv
- pipe viewer:
process1 | pv -pterbTCB 1G | process2
B
specifies the buffer size, here 1 GigibyteC
disables splice
, which is required for B
T
shows the buffer levelpterb
are the default display switches needed due to the presence of T
pv
might be available on systems where mbuffer/buffer
is not in the official repositories (such as arch linux
).
Upvotes: 28
Reputation: 23856
The program buffer
uses shared memory. This might be a problem, because in case of an error, memory may leak, because shared memory can outlive the program, which allocated the memory.
An alternative may be GNU dd
:
commandA |
dd status=none iflag=fullblock bs=1M |
commandB
It is important to use the fullblock
option. Otherwise dd
may cause data loss, when reading from a pipe.
Parameters of dd
explained
status=none
Set the level of information to print to stderr; 'none' suppresses everything but error messages
iflag=fullblock
accumulate full blocks of input
bs=1M
read and write up to one Mega bytes at a time (default: 512 bytes);
Upvotes: 5
Reputation: 2993
alternatively you could use a named pipe and run them in parallel:
mkfifo myfifo
commandB < myfifo &
commandA > myfifo
rm myfifo
Upvotes: -4
Reputation: 393114
You can use
E.g.
process1 | mbuffer -m 1024M | process2
to use a 1G buffer
Upvotes: 8
Reputation: 65284
BUFFER is called buffer. (man 1 buffer, maybe after apt-get install buffer)
Upvotes: 30