Reputation: 4043
To prevent this from being closed, I have narrowed my question to just the bash script.
EDITED QUESTION
I run a small network and made a mistake in a backup routine. I have rsync
running daily, and how it is set up is that if a folder is renamed on the source, then potential duplication on the backup device can occur.
rsync -varz --no-perms --exclude-from=/path/to/exclude_file --log-file=/path/to/rsync_logs
Recently a user made quite a few changes, and it resulted in a lot of duplication.
What kind of bash script strategies can I use to attack this? I've tried listing recursively and outputting to files and using diff
to compare these. This has lead me to see the impact of the duplication problem. If I could use some kind of automated process to remove the duplicates, that would save me loads of time.
I started by trying something like this:
find /mnt/data/ -maxdepth 2 -mindepth 1 -type d -printf '%f\n' > data.txt
and comparing to:
find /mnt/backup/ -maxdepth 2 -mindepth 1 -type d -printf '%f\n' > backup.txt
An example of my problem is this:
drwxr-xr-x 0 bob staff 0 Jun 25 2009 7-1-08
drwxr-xr-x 0 bob staff 0 Jun 25 2009 2008-07-01
This is an example from the backup drive, and the two directories are identical in their contents. The backup contains both and the source has only this one:
drwxr-xr-x 0 bob staff 0 Jun 25 2009 2008-07-01
This kind of issue is all throughout the backup drives.
EDIT
I created two lists and diff
ed them and then went through manually and reconciled the changes. It wasn't as bad as I originally thought, once I got into it. I gave +1s to both answers here (@Mark Pettit and @ebarrere), because I did end up using pieces from each answer. I ran several find commands in the course of this experiment and I ended up altering my rsync
script as well, to be more specific. Thank you guys.
Upvotes: 1
Views: 635
Reputation: 221
Although I agree with @Mark's suggestion to fix the rsync
script, you could use a find
with exec
to find duplicate files. Something like this:
cd /mnt/data
find . -type f -exec bash -c "ls /mnt/backup/'{}' &> /dev/null && echo /mnt/backup/'{}'" \;
would echo any files that exist in the same path under both directories. The path printed will be to the file in the backup directory. You could change the echo
to an rm -f
to remove the files, but be careful with that.
Upvotes: 2
Reputation: 141
You should fix this by fixing your rsync script, not by writing a new bash script.
If your source is clean, and it's only your backup destination that's messed up, you can easily clean up the destination by adding "--delete" to your list of arguments to "rsync". That flag tells rsync to delete any directories on the destination that do not exist on the source.
Upvotes: 3