Chris
Chris

Reputation: 6732

tar -z files to several server at once?

What's a more efficient way to compress and tar the same files to several servers at once? Right now I have:

for SERVER in $SERVERS; do
  tar czpf - my/directory | ssh $SERVER tar xzpf - &
done && wait

It gets the job done, and running the loop in parallel is an improvement, but with ~20 servers that's a lot of redundant zipping on my end. Is there a way to compute the tarball only once, then duplicate that same output into each ssh command in the loop?

Upvotes: 1

Views: 91

Answers (2)

glenn jackman
glenn jackman

Reputation: 247082

This will not be DRY, but process substitutions and tee can do it:

tar czpf - my/directory | tee \
    >(ssh server1 tar xzpf -) \
    >(ssh server2 tar xzpf -) \
    >(ssh server3 tar xzpf -) \
    >(ssh server4 tar xzpf -) \
    ...                       \
    >(ssh server20 tar xzpf -) \
    >/dev/null

Redirecting to /dev/null at the end prevents from printing the archive contents on the terminal.

...

I see this is just glglgl's point #1 spelled out explicitly. Made answer community wiki

Upvotes: 3

glglgl
glglgl

Reputation: 91119

You could do something with tee and either a FIFO or (on more advanced shells, such as bash) use the >(command) construct.

This could work in 2 possible ways:

  1. You could build up your command line in a loop so that you end with a command like tee >(ssh host1 tar xzpf -) >(ssh host2 tar xzpf -). How to do so in Bash is not completely clear to me; either you could use an array but I am not sure), or you'll have to use eval.
  2. You could start with an "initial" named FIFO where the tar data is piped in. Then, on each iteration, you do mkfifo next_pipe; tee <previous_pipe >(ssh ...) > next_pipe create a named FIFO in each iteration. At the end, you'll have to cat the last pipe to /dev/null in order to start the while, complidated pipe.

    Something like

    mkfifo 0
    tar czpf - my/directory > 0
    i=0
    for SERVER in $SERVERS; do
        i=$((i+1))
        mkfifo $i
        tee <$((i-1)) >(ssh $SERVER tar xzpf) >$i &
    end
    

and then wait for all processes to finish. And then clean up your directory!

That said I personally would probably write a Python script which does all that teeing stuff and which works via pipes, not via named FIFOs. That, among others, relieves from the pain to clean up at the end.

If that is not an option, using a local intermediate file seems the best solution to me.

But if it is, here is a first shot of a Python program which does the job. It is an ad-hoc shot which is completely untested, but shows the idea.

#/usr/bin/env python
import sys
import subprocess
cmd1 = sys.argv[1]
cmd2 = sys.argv[2]
hosts = sys.argv[3:]
sp1 = subprocess.Popen(cmd1, shell=True, stdout=subprocess.PIPE)
spn = []
for host in hosts:
    sp = subprocess.Popen(['ssh', host, cmd1], stdin=subprocess.PIPE)
    spn.append(sp)
while True:
    block = sp1.stdout.read(4096)
    if not block: break
    for sp in spn:
        sp.stdin.write(block)
c = sp1.wait()
for sp in spn:
    c2 = sp.wait()
    if not c: c = c2
sys.exit(c)

Can be called with

./program 'tar czpf - my/directory -' 'tar xzpf' host1 host2 host3 ...

Upvotes: 2

Related Questions