Reputation: 1149
I have am looking to solve the problem of writing a series of very large streams concatenated to stdout, and then reading those streams from stdin again, splitting the streams into their original parts. The limitation I face is that at no time can I create any temporary files on disk.
I tried to use the unxz --singe-stream option, but this isn't having the effect I'm expecting.
To demonstrate what I am trying to achieve, I have two scripts:
user@localhost:~# cat test-source.sh
#!/bin/bash
echo "one" | xz
echo "two" | xz
echo "three" | xz
The above first script is then piped into the second script that is intended to reverse the effect:
user@localhost:~# cat test-sink.sh
#!/bin/bash
unxz --single-stream
unxz --single-stream
unxz --single-stream
The above script is expected to output the following:
one
two
three
Instead I see the following:
user@localhost:~# ./test-source.sh | ./test-sink.sh
one
unxz: (stdin): File format not recognized
unxz: (stdin): File format not recognized
The xz above was just one option I tried, I am open to other suggestions. gzip wants to uncompress the whole stream at once, I need to preserve the boundaries between the streams.
I understand that tar is no good, as it cannot accept a stream to tar from stdin.
Is there any other tool out there that can be used to script this?
Upvotes: 1
Views: 201
Reputation: 1149
As an alternative tool I came up with tarmux, which provides a multiplexer / demultiplexer written in C and based on the tar file format provided by libarchive.
The test scripts now look like this:
Little-Net:trunk minfrin$ cat ./test-source.sh
#!/bin/bash
echo "one" | tarmux
echo "two" | tarmux
echo "three" | tarmux
And this:
Little-Net:trunk minfrin$ cat ./test-sink.sh
#!/bin/bash
tardemux
tardemux
tardemux
The output of tardemux can be piped into other commands, and at no point does a file touch a disk.
Upvotes: 1
Reputation: 311576
I don't know if this will solve your problem or not (since it would require installing some software, which given the nature of this question is maybe not an option), but you inspired to hack together something that does exactly what you were describing:
You can iteratively produce an output stream from several chunks, as in:
echo "one" | xz | mux
echo "two" | xz | mux
echo "three" | xz | mux
And then pass that to a demux
command on the other side to extract the individual components. E.g, a trivial example:
$ (
echo "one" | xz | mux
echo "two" | xz | mux
echo "three" | xz | mux
) | demux -v
INFO:demux:processing stream 0 to stream-0.out
INFO:demux:processing stream 1 to stream-1.out
INFO:demux:processing stream 2 to stream-2.out
This takes the input streams and produces three files in your current directory.
It does other other things, too, like optionally adding a sha256 hash to each stream for data integrity verification.
Upvotes: 2
Reputation: 311576
Given your source script script, if I run:
sh test-source.sh | unxz
I get as output:
one
two
three
That seems to be the behavior you're asking for. Your attempt at running unxz --single-stream
several times doesn't work because the first unxz
process consumes all of the input, even though it only extracts the first stream.
Upvotes: 0