Reputation: 15270
I need to compare two directories to validate a backup.
Say my directory looks like the following:
Filename Filesize Filename Filesize
user@main_server:~/mydir/ user@backup_server:~/mydir/
file1000.txt 4182410737 file1000.txt 4182410737
file1001.txt 8241410737 - <-- missing on backup_server!
... ...
file9999.txt 2410418737 file9999.txt 1111111111 <-- size != main_server
Is there a quick one liner that would get me close to output like:
Invalid Backup Files:
file1001.txt
file9999.txt
(with the goal to instruct the backup script to refetch these files)
I've tried to get variations of the following to no avail.
[main_server] $ rsync -n ~/mydir/ user@backup_server:~/mydir
I cannot do rsync
to backup the directories itself because it takes way too long (8-24hrs). Instead I run multiple threads of scp
to fetch files in batches. This completes regularly <1hr. However, occasionally I find a few files that were somehow missed (perhaps dropped connection).
Speed is a priority, so file sizes should be sufficient. But I'm open to including a checksum
, provided it doesn't slow the process down like I find with rsync
.
Here's my test process:
# Generate Large Files (1GB)
for i in {1..100}; do head -c 1073741824 </dev/urandom >foo-$i ; done
# SCP them from src to dest
for i in {1..100}; do ( scp ~/mydir/foo-$i user@backup_server:~/mydir/ & ) ; sleep 0.1 ; done
# Confirm destination has everything from source
# This is the point of the question. I've tried:
rsync -Sa ~/mydir/ user@backup_server:~/mydir
# Way too slow
What do you recommend?
Upvotes: 1
Views: 84
Reputation: 113924
By default, rsync uses the quick check method which only transfers files that differ in size or last-modified time. As you report that the sizes are unchanged, that would seem to indicate that the timestamps differ. Two options to handlel this are:
Use -p
to preserve timestamps when transferring files.
Use --size-only
to ignore timestamps and transfer only files that differ in size.
Upvotes: 1