Daniel Böhmer
Daniel Böhmer

Reputation: 15381

Bash piping: split input after 1st line, run different commands on line 1 and the rest

I have a backup script which copies a file via ssh and creates a hash file during transfer. Additionally pv is used to display a progress bar:

ssh $host "cat '$src_dir/$filename'" \
    | pv --bytes --eta --progress --rate --timer --wait  $opts \
    | tee "$filename" \
    | sha1sum > "$filename.sha1"

To display the progress bar correctly I need to pass the size of the file to pv. At the moment I do that by giving the size of the last backup file.

It is important that everything is done with 1 ssh call because I don't want to authorize multiple times for 1 backup. SSH keys are in use but secured with a passphrase.

My idea is to call ssh $host "stat -c%s '$src_dir/$filename'; cat '$src_dir/$filename'" and split the output after the 1st line. I could read the filesize and then call pv on the rest of the input.

The backup script is working fine. This is just an exercise for the fun of it. Any ideas are appreciated but I will not completely rewrite the whole script. Especially I won't switch to scp because I can't do the hashing during transfer then and I don't want to trust the local storage for that (of course I do, but then why do we do hashing anyway?).

Update: I ended up doing this:

ssh $host "cd '$src_dir' && stat -c%s '$filename' && sha1sum '$filename' && cat '$filename'" | {
        read size;
        head -n 1 > "$filename.sha1";
        pv --bytes --eta --progress --rate --size $size --timer --wait > "$filename"
    }

It works perfectly and has the advantage that the hash is computed at the remote site. So not only before writing to disk but before doing the network transfer.

Upvotes: 2

Views: 835

Answers (1)

William Pursell
William Pursell

Reputation: 212248

In this example, I use stat on OSX, which uses z for the format string for size:

$ { stat -f %z input; cat input; } | { read s; echo $s; }

Replace the first command list with your ssh call, and replace the echo with you pv list, and you are good to go. In other words, your final command should be:

$ ssh $host "stat -c%s '$src_dir/$filename'; cat '$src_dir/$filename'" | {
    read size;
    pv --bytes --eta --progress --rate --timer --wait --size $size |
    tee "$filename" |
    sha1sum; } > "$filename.sha1"

I few things to note: I don't have access to pv and did not check the above. I replaced your line continuations with pipe symbols, because I think it looks nicer.

Another idea: on the right hand side of the pipe, you could do:

{ pv --bytes ... --size $( sed 1q ) |
...

This definitely relies on sed not reading past the first newline, and I do not know if that is guaranteed by any standard, but it works with ... oh my, my sed does not support --version, -?, or -h, and I do not know the package management system of OS X well enough to tell you what sed I'm running. It works with some version of BSD sed...

Upvotes: 2

Related Questions