user4019417
user4019417

Reputation:

Using RSync to copy a sequential range of files

Sorry if this makes no sense, but I will try to give all the information needed!

I would like to use rsync to copy a range of sequentially numbered files from one folder to another.

I am archiving a DCDM (Its a film thing) and it contains in the order of 600,000 individually numbered, sequential .tif image files (~10mb ea.).

I need to break this up to properly archive onto LTO6 tapes. And I would like to use rsync to prep the folders such that my simple bash .sh file can automate the various folders and files that I want to back up to tape.

The command I normally use when running rsync is:

sudo rsync -rvhW --progress --size only <src> <dest>

I use sudo if needed, and I always test the outcome first with --dry-run

The only way I’ve got anything to work (without kicking out errors) is by using the * wildcard. However, this only does files with the set pattern (eg. 01* will only move files from the range 010000 - 019999) and I would have to repeat for 02, 03, 04 etc..

I've looked on the internet, and am struggling to find an answer that works.

This might not be possible, and with 600,000 .tif files, I can't write an exclude for each one!

Any thoughts as to how (if at all) this could be done?

Owen.

Upvotes: 6

Views: 4853

Answers (4)

Jamie Metzger
Jamie Metzger

Reputation: 48

If you are writing to LTO6 tapes, you should consider including "--inplace" to your command. Inplace is meant for writing to linear filesystems such as LTO

Upvotes: 0

mulllhausen
mulllhausen

Reputation: 4435

old question i know, but someone may find this useful. the above examples for expanding a range also work with rsync. for example to copy files starting with a, b and c but not d and e from dir /tmp/from_here to dir /tmp/to_here:

$ rsync -avv /tmp/from_here/[a-c]* /tmp/to_here
sending incremental file list
delta-transmission disabled for local transfer or --whole-file
alice/
bob/
cedric/
total: matches=0  hash_hits=0  false_alarms=0 data=0

sent 89 bytes  received 24 bytes  226.00 bytes/sec
total size is 0  speedup is 0.00

Upvotes: 0

5gon12eder
5gon12eder

Reputation: 25419

Globing is the feature of the shell to expand a wildcard to a list of matching file names. You have already used it in your question.

For the following explanations, I will assume we are in a directory with the following files:

$ ls -l
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 file.txt
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 funny_cat.jpg
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 report_2013-1.pdf
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 report_2013-2.pdf
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 report_2013-3.pdf
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 report_2013-4.pdf
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 report_2014-1.pdf
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 report_2014-2.pdf

The most simple case is to match all files. The following makes for a poor man's ls.

$ echo *
file.txt funny_cat.jpg report_2013-1.pdf report_2013-2.pdf report_2013-3.pdf report_2013-4.pdf report_2014-1.pdf report_2014-2.pdf

If we want to match all reports from 2013, we can narrow the match:

$ echo report_2013-*.pdf
report_2013-1.pdf report_2013-2.pdf report_2013-3.pdf report_2013-4.pdf

We could, for example, have left out the .pdf part but I like to be as specific as possible.

You have already come up with a solution to use this for selecting a range of numbered files. For example, we can match reports by quater:

$ for q in 1 2 3 4; do echo "$q. quater: " report_*-$q.pdf; done
1. quater:  report_2013-1.pdf report_2014-1.pdf
2. quater:  report_2013-2.pdf report_2014-2.pdf
3. quater:  report_2013-3.pdf
4. quater:  report_2013-4.pdf

If we are to lazy to type 1 2 3 4, we could have used $(seq 4) instead. This invokes the program seq with argument 4 and substitutes its output (1 2 3 4 in this case).

Now back to your problem: If you want chunk sizes that are a power of 10, you should be able to extend the above example to fit your needs.

Upvotes: 1

John B
John B

Reputation: 3646

You can check for the file name starting with a digit by using pattern matching:

for file in [0-9]*; do
    # do something to $file name that starts with digit
done

Or, you could enable the extglob option and loop over all file names that contain only digits. This could eliminate any potential unwanted files that start with a digit but contain non-digits after the first character.

shopt -s extglob
for file in +([0-9]); do
    # do something to $file name that contains only digits
done
  • +([0-9]) expands to one or more occurrence of a digit

Update:

Based on the file name pattern in your recent comment:

shopt -s extglob
for file in legendary_dcdm_3d+([0-9]).tif; do
    # do something to $file
done

Upvotes: 3

Related Questions