Reputation: 1165
I have moved a web site from one server to another and I copied the files using SCP. I now wish to check that all the files have been copied OK. How do I compare the sites?
Count files for a folder? Get the total files size for folder tree? Or is there a better way to compare the sites?
Upvotes: 52
Views: 42583
Reputation: 53265
If comparing two folders on the same computer, diff
is fine, as explained by the main answer.
However, if trying to compare two folders on different computers, or across a network, don't do that! If across a network, it will take forever since it has to actually transmit every byte of every file in the folder across the network. So, if you are comparing a 3 GB dir, all 3 GB have to be transferred across the network just to see if the remote dir and local dir are the same.
Instead, use a SHA256 hash. Hash the dir on one computer on that computer, and on the other computer on that computer. Here is how:
(From my answer here: How to hash all files in an entire directory, including the filenames as well as their contents):
# 1. First, cd to the dir in which the dir of interest is found. This is
# important! If you don't do this, then the paths output by find will differ
# between the two computers since the absolute paths to `mydir` differ. We are
# going to hash the paths too, not just the file contents, so this matters.
cd /home/gabriel # example on computer 1
cd /home/gabriel/dev/repos # example on computer 2
# 2. hash all files inside `mydir`, then hash the list of all hashes and their
# respective file paths. This obtains one single final hash. Sorting is
# necessary by piping to `sort` to ensure we get a consistent file order in
# order to ensure a consistent final hash result. Piping to awk extracts
# just the hash.
find mydir -type f -exec sha256sum {} + | sort | sha256sum | awk '{print $1}'
Example run and doutput:
$ find eclipse-workspace -type f -exec sha256sum {} + | sort | sha256sum | awk '{print $1}'
8f493478e7bb77f1d025cba31068c1f1c8e1eab436f8a3cf79d6e60abe2cd2e4
Do this on each computer, then ensure the hashes are the same to know if the directories are the same.
Note that the above commands ignore empty directories, file permissions, timestamps of when files were last edited, etc. For most cases though that's ok.
You can also use rsync
to basically do this same thing for you, even when copying or comparing across a network.
Upvotes: 2
Reputation: 1061
Others mentioned that a hash function does not need to be used, but I like having a visual/record of what is being compared and what the differences are.
The downside is this script is much slower than using something like diff -rq <dir1> <dir2>
.
compare_2_dirs.sh:
#!/bin/bash
DIR1="${1%/}"
DIR2="${2%/}"
COMPFILE1="compare_${DIR1}.txt"
COMPFILE2="compare_${DIR2}.txt"
pushd ${DIR1} >/dev/null
find . -type f | xargs md5sum > ../${COMPFILE1}
popd > /dev/null
pushd ${DIR2} >/dev/null
find . -type f | xargs md5sum > ../${COMPFILE2}
popd > /dev/null
diff ${COMPFILE1} ${COMPFILE2}
Upvotes: 0
Reputation: 481
Possibly an overconvoluted way
bc -l <<< $(find dirname -type f -exec sum {} /tmp/{} \; | cut -f1 -d' ' | paste -d- - -) | grep -v 0
Assumptions - both directories to be compared has the same name (ie. dirname in this example and that dirname is also in /tmp)
Explanation :
Find the files in dirname and in /tmp/dirname and checksum it
Join the 2 checksums with a minus as the delimiter, which makes is an expression
Pipe the expressions into the bc calculator
If they are the same, they will be zero
grep to see if there are any non-zero
If there is no output, the directories are identical
Hope this is helpful
Upvotes: 0
Reputation: 31
To add on reply from Sidney. It is not very necessary to filter out -type f, and produce hash code. In reply to zidarsk8, you don't need to sort, since find, same as ls, sorts the filenames alphabetically by default. It works for empty directories as well.
To summarize, top 3 best answers would be: (P.S. Nice to do a dry run with rsync)
diff -r -q /path/to/dir1 /path/to/dir2
diff <(cd dir1 && find) <(cd dir2 && find)
rsync --dry-run -avh from/my/dir newhost:/to/new/dir
Upvotes: 3
Reputation: 1570
Using diff with the recursive -r
and quick -q
option. It is the best and by far the fastest way to do this.
diff -r -q /path/to/dir1 /path/to/dir2
It won't tell you what the differences are (remove the -q option to see that), but it will very quickly tell you if all the files are the same.
If it shows no output, all the files are the same, otherwise it will list the files that are different.
Upvotes: 94
Reputation: 708
I would add this to Douglas Leeder or Eineki, but sadly, don't have enough reputation to comment. Anyway, their answers are both great, excepting that they don't work for file names with spaces. To make that work, do
find [dir1] -type f -print0 | xargs -0 [preferred hash function] > [file1]
find [dir2] -type f -print0 | xargs -0 [preferred hash function] > [file2]
diff -y [file1] [file2]
Just from experimenting, I also like to use the -W ### arguement on diff and output it to a file, easier to parse and understand in the terminal.
Upvotes: 0
Reputation: 301135
If you were using scp, you could probably have used rsync.
rsync won't transfer files that are already up to date, so you can use it to verify a copy is current by simply running rsync again.
If you were doing something like this on the old host:
scp -r from/my/dir newhost:/to/new/dir
Then you could do something like
rsync -a --progress from/my/dir newhost:/to/new/dir
The '-a' is short for 'archive' which does a recursive copy and preserves permissions, ownerships etc. Check the man page for more info, as it can do a lot of clever things.
Upvotes: 22
Reputation: 20783
I have been move a web site from one server to another I copied the files using SCP
You could do this with rsync, it is great if you just want to mirror something.
/Johan
Update : Seems like @rjack beat me with the rsync answer with 6 seconds :-)
Upvotes: 0
Reputation: 11247
If you used scp, you probably can also use rsync over ssh.
rsync -avH --delete-after 1.example.com:/path/to/your/dir 2.example.com:/path/to/your/
rsync does the checksums for you.
Be sure to use the -n option to perform a dry-run. Check the manual page.
I prefer rsync over scp or even local cp, every time I can use it.
If rsync is not an option, md5sum can generate md5 digests and md5sumc --check will check them.
Upvotes: 1
Reputation: 14959
maybe you can use something similar to this:
find <original root dir> | xargs md5sum > original
find <new root dir> | xargs md5sum > new
diff original new
Upvotes: 5
Reputation: 53285
cd website
find . -type f -print | sort | xargs sha1sum
will produce a list of checksums for the files. You can then diff
those to see if there are any missing/added/different files.
Upvotes: 11