Ryan
Ryan

Reputation: 15270

How can I compare the file sizes match between duplicate directories?

I need to compare two directories to validate a backup.

Say my directory looks like the following:

Filename        Filesize      Filename        Filesize
user@main_server:~/mydir/     user@backup_server:~/mydir/
file1000.txt    4182410737    file1000.txt    4182410737
file1001.txt    8241410737    -                          <-- missing on backup_server!
...                           ...
file9999.txt    2410418737    file9999.txt    1111111111 <-- size != main_server

Is there a quick one liner that would get me close to output like:

Invalid Backup Files:
file1001.txt
file9999.txt

(with the goal to instruct the backup script to refetch these files)

I've tried to get variations of the following to no avail.

[main_server] $ rsync -n ~/mydir/ user@backup_server:~/mydir

I cannot do rsync to backup the directories itself because it takes way too long (8-24hrs). Instead I run multiple threads of scp to fetch files in batches. This completes regularly <1hr. However, occasionally I find a few files that were somehow missed (perhaps dropped connection).

Speed is a priority, so file sizes should be sufficient. But I'm open to including a checksum, provided it doesn't slow the process down like I find with rsync.

Here's my test process:

# Generate Large Files (1GB)
for i in {1..100}; do head -c 1073741824 </dev/urandom >foo-$i ; done

# SCP them from src to dest
for i in {1..100}; do ( scp ~/mydir/foo-$i user@backup_server:~/mydir/ & ) ; sleep 0.1 ; done

# Confirm destination has everything from source
# This is the point of the question. I've tried:

rsync -Sa ~/mydir/ user@backup_server:~/mydir
# Way too slow

What do you recommend?

Upvotes: 1

Views: 84

Answers (1)

John1024
John1024

Reputation: 113924

By default, rsync uses the quick check method which only transfers files that differ in size or last-modified time. As you report that the sizes are unchanged, that would seem to indicate that the timestamps differ. Two options to handlel this are:

  • Use -p to preserve timestamps when transferring files.

  • Use --size-only to ignore timestamps and transfer only files that differ in size.

Upvotes: 1

Related Questions