Paul
Paul

Reputation: 1390

copy with rsync when files are different

I have to copy a big directory to my NAS using rsync, I would like to say to rsync only copy the files when source and destination are different to avoid to copy a files already copied.

Upvotes: 3

Views: 7674

Answers (1)

Mecki
Mecki

Reputation: 132909

Skipping identical files is the whole purpose why people use rsync. This is default behavior of rsync. Most of the time the only option you want to use is -a:

rsync -a -P <source> <dest>

The -P just means show progress and the -a means "archive" and that means "when copying files, try to make copy as identical as possible" (try to keep permissions, ownership, timestamps, etc.) but is also means "Only update files if you have to". It's like saying "make sure <dest> is an up-to-date backup of <source>".

However, by default rsync will already consider two files identical, if they have same file size and same last modification date. Of course, two files may also have same size and same last modification date and not be identical. So when running that command for the very first time and you are not sure which files may need update and which ones don't, try this:

rsync -a -c -P <source> <dest>

-c means don't rely just upon size and date, checksum every file and compare the checksums. Only if checkums are identical, consider files as identical. Note that rsync will not necessary checksum the whole file, big files are broken into smaller chunks and every chunk is checksumed separately as only chunks that have changed are transferred.

So even with checksuming you can save you a lot of time when copying over a network connection. It won't save you any time when copying locally because just copying everything is probably faster than checksuming everything. So a plain copy will always beat a checksuming rsync in speed when both, source and destination, are local drives. In that case use

cp -a -v <source> <dest>

or if your system doesn't know -a, use

cp -pPR -v <source> <dest>

that's identical to -a. Again, the -v is just to see some progress.

And I'd only use -c for the very first sync, after that, relying on file size and last modification date usually works very well for updating and it is a whole lot faster. It will work because if a file has been altered since the last sync, it will have a different last modification date and so by just comparing the dates rysnc will know that the file must be updated at the destination. Of course, that only works if your systems all have the correct date/time set and if you don't manipulate the last modification date of files and also don't forbid your system to update them.

If you want to skip files solely on presence, use this:

rsync -a -P --ignore-existing <source> <dest>

That's like telling rsync "If you see a file with the same name at the destination, always consider it to be identical and never update it".

Please note that if -a detects a file in <source> is different than a files in <dist>, whether this is determined by size and modification date or by checksumming, it will always update the file at <dest> to match then file at <source>. If multiple sources are syncing to the same destination, you might also want to add -u which means "in case two files are different, only update if the file at <source> has a newer last modification date than then file at <dest>"


Just as a general tip, if you type

man <command>

in a terminal, you will get a nice help page on most systems (Linux, MacOS X and UNIX systems), explaining you all the options in all detail. You can scroll up/down using arrow keys or page up/down and you can leave that view by hitting "q" for quit. E.g.

man rsync

Upvotes: 8

Related Questions