Graham Leggett
Graham Leggett

Reputation: 1149

Concatenating to stdout, then splitting on stdin - is this possible?

I have am looking to solve the problem of writing a series of very large streams concatenated to stdout, and then reading those streams from stdin again, splitting the streams into their original parts. The limitation I face is that at no time can I create any temporary files on disk.

I tried to use the unxz --singe-stream option, but this isn't having the effect I'm expecting.

To demonstrate what I am trying to achieve, I have two scripts:

user@localhost:~# cat test-source.sh 
#!/bin/bash

echo "one" | xz
echo "two" | xz
echo "three" | xz

The above first script is then piped into the second script that is intended to reverse the effect:

user@localhost:~# cat test-sink.sh 
#!/bin/bash

unxz --single-stream
unxz --single-stream
unxz --single-stream

The above script is expected to output the following:

one
two
three

Instead I see the following:

user@localhost:~# ./test-source.sh | ./test-sink.sh 
one
unxz: (stdin): File format not recognized
unxz: (stdin): File format not recognized

The xz above was just one option I tried, I am open to other suggestions. gzip wants to uncompress the whole stream at once, I need to preserve the boundaries between the streams.

I understand that tar is no good, as it cannot accept a stream to tar from stdin.

Is there any other tool out there that can be used to script this?

Upvotes: 1

Views: 201

Answers (3)

Graham Leggett
Graham Leggett

Reputation: 1149

As an alternative tool I came up with tarmux, which provides a multiplexer / demultiplexer written in C and based on the tar file format provided by libarchive.

The test scripts now look like this:

Little-Net:trunk minfrin$ cat ./test-source.sh 
#!/bin/bash

echo "one" | tarmux
echo "two" | tarmux
echo "three" | tarmux

And this:

Little-Net:trunk minfrin$ cat ./test-sink.sh 
#!/bin/bash

tardemux
tardemux
tardemux

The output of tardemux can be piped into other commands, and at no point does a file touch a disk.

Upvotes: 1

larsks
larsks

Reputation: 311576

I don't know if this will solve your problem or not (since it would require installing some software, which given the nature of this question is maybe not an option), but you inspired to hack together something that does exactly what you were describing:

You can iteratively produce an output stream from several chunks, as in:

echo "one" | xz | mux
echo "two" | xz | mux
echo "three" | xz | mux

And then pass that to a demux command on the other side to extract the individual components. E.g, a trivial example:

$ (
  echo "one" | xz | mux
  echo "two" | xz | mux
  echo "three" | xz | mux
  ) | demux -v
INFO:demux:processing stream 0 to stream-0.out
INFO:demux:processing stream 1 to stream-1.out
INFO:demux:processing stream 2 to stream-2.out

This takes the input streams and produces three files in your current directory.

It does other other things, too, like optionally adding a sha256 hash to each stream for data integrity verification.

Upvotes: 2

larsks
larsks

Reputation: 311576

Given your source script script, if I run:

sh test-source.sh | unxz

I get as output:

one
two
three

That seems to be the behavior you're asking for. Your attempt at running unxz --single-stream several times doesn't work because the first unxz process consumes all of the input, even though it only extracts the first stream.

Upvotes: 0

Related Questions