ChristopherDBerry
ChristopherDBerry

Reputation: 1772

Efficient transfer of console data, tar & gzip/ bzip2 without creating intermediary files

Linux environment. So, we have this program 't_show', when executed with an ID will write price data for that ID on the console. There is no other way to get this data.

I need to copy the price data for IDs 1-10,000 between two servers, using minimum bandwidth, minimum number of connections. On the destination server the data will be a separate file for each id with the format:

<id>.dat

Something like this would be the long-winded solution:

dest:

files=`seq 1 10000`
for id in `echo $files`;
do
    ./t_show $id > $id
done
tar cf - $files | nice gzip -c  > dat.tar.gz

source:

scp user@source:dat.tar.gz ./
gunzip dat.tar.gz
tar xvf dat.tar

That is, write each output to its own file, compress & tar, send over network, extract.

It has the problem that I need to create a new file for each id. This takes up tonnes of space and doesn't scale well.

Is it possible to write the console output directly to a (compressed) tar archive without creating the intermediate files? Any better ideas (maybe writing compressed data directly across network, skipping tar)?

The tar archive would need to extract as I said on the destination server as a separate file for each ID.

Thanks to anyone who takes the time to help.

Upvotes: 1

Views: 1132

Answers (6)

ChristopherDBerry
ChristopherDBerry

Reputation: 1772

Thanks all

I've taken the advice 'just send the data formatted in some way and parse it on the the receiver', it seems to be the consensus. Skipping tar and using ssh -C for simplicity.

Perl script. Breaks the ids into groups of 1000. IDs are source_id in hash table. All data is sent via single ssh, delimited by 'HEADER', so it writes to the appropriate file. This is a lot more efficient:

sub copy_tickserver_files {
my $self = shift;

my $cmd = 'cd tickserver/ ; ';

my $i = 1;

while ( my ($source_id, $dest_id) = each ( %{ $self->{id_translations} } ) ) {
    $cmd .= qq{ echo HEADER $source_id ; ./t_show $source_id ; };
    $i++;
    if ( $i % 1000 == 0 ) {
        $cmd = qq{ssh -C dba\@$self->{source_env}->{tickserver} " $cmd " | };
        $self->copy_tickserver_files_subset( $cmd );
        $cmd = 'cd tickserver/ ; ';
    }
}

$cmd = qq{ssh -C dba\@$self->{source_env}->{tickserver} " $cmd " | };
$self->copy_tickserver_files_subset( $cmd );

}

sub copy_tickserver_files_subset {
my $self = shift;
my $cmd = shift;

my $output = '';
open TICKS, $cmd;
while(<TICKS>) {
    if ( m{HEADER [ ] ([0-9]+) }mxs ) {
        my $id = $1;
        $output = "$self->{tmp_dir}/$id.ts";
        close TICKSOP;
        open TICKSOP, '>', $output;
        next;
    }
    next unless $output;
    print TICKSOP "$_";
}
close TICKS;
close TICKSOP;
}

Upvotes: 0

f4m8
f4m8

Reputation: 410

I would try this:

(for ID in $(seq 1 10000); do echo $ID: $(/t_show $ID); done) | ssh user@destination "ImportscriptOrProgram" 

This will print "1: ValueOfID1" to standardout, which a transfered via ssh to the destination host, where you can start your importscript or program, which reads the lines from standardin.

HTH

Upvotes: 0

Arnaud Le Blanc
Arnaud Le Blanc

Reputation: 99919

You could just send the data formatted in some way and parse it on the the receiver.

foo.sh on the sender:

#!/bin/bash
for (( id = 0; id <= 10000; id++ ))
do
    data="$(./t_show $id)"
    size=$(wc -c <<< "$data")

    echo $id $size
    cat <<< "$data"
done

On the receiver:

ssh -C user@server 'foo.sh'|while read file size; do
    dd of="$file" bs=1 count="$size"
done

ssh -C compresses the data during transfer

Upvotes: 2

rodrigo
rodrigo

Reputation: 98496

You can do better without tar:

#!/bin/bash
for id in `seq 1 1000`
do
    ./t_show $id
done | gzip

The only difference is that you will not get the boundaries between different IDs.

Now put that in a script, say show_me_the_ids and do from the client

shh user@source ./show_me_the_ids | gunzip

And there they are!

Or either, you can specify the -C flag to compress the SSH connection and remove the gzip / gunzip uses all together.

If you are really into it you may try ssh -C, gzip -9 and other compression programs. Personally I'll bet for lzma -9.

Upvotes: 0

carlpett
carlpett

Reputation: 12613

You can at least tar stuff over a ssh connection:

tar -czf - inputfiles | ssh remotecomputer "tar -xzf -"

How to populate the archive without intermediary files however, I don't know.

EDIT: Ok, I suppose you could do it by writing the tar file manually. The header is specified here and doesn't seem too complicated, but that isn't exactly my idea of convenient...

Upvotes: 1

Thomas Berger
Thomas Berger

Reputation: 1870

I don't think this is working with a plain bash script. But you could have a look at the Archive::TAR module for perl or other scripting languages.

The Perl Module has a function add_data to create a "file" on the fly and add it to the archive for streaming accros the network.

The Documentation is found here:

Upvotes: 0

Related Questions