Pbearne
Pbearne

Reputation: 1165

how do I check that two folders are the same in linux

I have moved a web site from one server to another and I copied the files using SCP. I now wish to check that all the files have been copied OK. How do I compare the sites?

Count files for a folder? Get the total files size for folder tree? Or is there a better way to compare the sites?

Upvotes: 52

Views: 42583

Answers (13)

Gabriel Staples
Gabriel Staples

Reputation: 53265

...when comparing two folders across a network drive or on separate computers

If comparing two folders on the same computer, diff is fine, as explained by the main answer.

However, if trying to compare two folders on different computers, or across a network, don't do that! If across a network, it will take forever since it has to actually transmit every byte of every file in the folder across the network. So, if you are comparing a 3 GB dir, all 3 GB have to be transferred across the network just to see if the remote dir and local dir are the same.

Instead, use a SHA256 hash. Hash the dir on one computer on that computer, and on the other computer on that computer. Here is how:

(From my answer here: How to hash all files in an entire directory, including the filenames as well as their contents):

# 1. First, cd to the dir in which the dir of interest is found. This is
# important! If you don't do this, then the paths output by find will differ
# between the two computers since the absolute paths to `mydir` differ. We are
# going to hash the paths too, not just the file contents, so this matters. 
cd /home/gabriel            # example on computer 1
cd /home/gabriel/dev/repos  # example on computer 2

# 2. hash all files inside `mydir`, then hash the list of all hashes and their
# respective file paths. This obtains one single final hash. Sorting is
# necessary by piping to `sort` to ensure we get a consistent file order in
# order to ensure a consistent final hash result. Piping to awk extracts 
# just the hash.
find mydir -type f -exec sha256sum {} + | sort | sha256sum | awk '{print $1}'

Example run and doutput:

$ find eclipse-workspace -type f -exec sha256sum {} + | sort | sha256sum | awk '{print $1}'
8f493478e7bb77f1d025cba31068c1f1c8e1eab436f8a3cf79d6e60abe2cd2e4

Do this on each computer, then ensure the hashes are the same to know if the directories are the same.

Note that the above commands ignore empty directories, file permissions, timestamps of when files were last edited, etc. For most cases though that's ok.

You can also use rsync to basically do this same thing for you, even when copying or comparing across a network.

See also

  1. My answer: Linux: compute a single hash for a given folder & contents?

Upvotes: 2

mhck
mhck

Reputation: 1061

Others mentioned that a hash function does not need to be used, but I like having a visual/record of what is being compared and what the differences are.

The downside is this script is much slower than using something like diff -rq <dir1> <dir2>.

compare_2_dirs.sh:

#!/bin/bash

DIR1="${1%/}"
DIR2="${2%/}"

COMPFILE1="compare_${DIR1}.txt"
COMPFILE2="compare_${DIR2}.txt"

pushd ${DIR1} >/dev/null
find . -type f | xargs md5sum > ../${COMPFILE1}
popd > /dev/null

pushd ${DIR2} >/dev/null
find . -type f | xargs md5sum > ../${COMPFILE2}
popd > /dev/null

diff ${COMPFILE1} ${COMPFILE2}

Upvotes: 0

Chai Ang
Chai Ang

Reputation: 481

Possibly an overconvoluted way

bc -l <<< $(find dirname -type f -exec sum {} /tmp/{} \; | cut -f1 -d' ' | paste -d- - -) | grep -v 0

Assumptions - both directories to be compared has the same name (ie. dirname in this example and that dirname is also in /tmp)

Explanation :

Find the files in dirname and in /tmp/dirname and checksum it

Join the 2 checksums with a minus as the delimiter, which makes is an expression

Pipe the expressions into the bc calculator

If they are the same, they will be zero

grep to see if there are any non-zero

If there is no output, the directories are identical

Hope this is helpful

Upvotes: 0

luz
luz

Reputation: 31

To add on reply from Sidney. It is not very necessary to filter out -type f, and produce hash code. In reply to zidarsk8, you don't need to sort, since find, same as ls, sorts the filenames alphabetically by default. It works for empty directories as well.

To summarize, top 3 best answers would be: (P.S. Nice to do a dry run with rsync)

diff -r -q /path/to/dir1 /path/to/dir2

diff <(cd dir1 && find) <(cd dir2 && find)

rsync --dry-run -avh from/my/dir newhost:/to/new/dir

Upvotes: 3

phayes
phayes

Reputation: 1570

Using diff with the recursive -r and quick -q option. It is the best and by far the fastest way to do this.

diff -r -q /path/to/dir1 /path/to/dir2

It won't tell you what the differences are (remove the -q option to see that), but it will very quickly tell you if all the files are the same.

If it shows no output, all the files are the same, otherwise it will list the files that are different.

Upvotes: 94

Sidney
Sidney

Reputation: 708

I would add this to Douglas Leeder or Eineki, but sadly, don't have enough reputation to comment. Anyway, their answers are both great, excepting that they don't work for file names with spaces. To make that work, do

find [dir1] -type f -print0 | xargs -0 [preferred hash function] > [file1]

find [dir2] -type f -print0 | xargs -0 [preferred hash function] > [file2]

diff -y [file1] [file2]

Just from experimenting, I also like to use the -W ### arguement on diff and output it to a file, easier to parse and understand in the terminal.

Upvotes: 0

Paul Dixon
Paul Dixon

Reputation: 301135

If you were using scp, you could probably have used rsync.

rsync won't transfer files that are already up to date, so you can use it to verify a copy is current by simply running rsync again.

If you were doing something like this on the old host:

scp -r from/my/dir newhost:/to/new/dir

Then you could do something like

rsync -a --progress from/my/dir newhost:/to/new/dir

The '-a' is short for 'archive' which does a recursive copy and preserves permissions, ownerships etc. Check the man page for more info, as it can do a lot of clever things.

Upvotes: 22

Johan
Johan

Reputation: 20783

I have been move a web site from one server to another I copied the files using SCP

You could do this with rsync, it is great if you just want to mirror something.

/Johan

Update : Seems like @rjack beat me with the rsync answer with 6 seconds :-)

Upvotes: 0

Giacomo
Giacomo

Reputation: 11247

If you used scp, you probably can also use rsync over ssh.

rsync -avH --delete-after 1.example.com:/path/to/your/dir 2.example.com:/path/to/your/

rsync does the checksums for you.

Be sure to use the -n option to perform a dry-run. Check the manual page.

I prefer rsync over scp or even local cp, every time I can use it.

If rsync is not an option, md5sum can generate md5 digests and md5sumc --check will check them.

Upvotes: 1

Eineki
Eineki

Reputation: 14959

maybe you can use something similar to this:

find <original root dir> | xargs md5sum  > original
find <new root dir> | xargs md5sum  > new
diff original new

Upvotes: 5

Douglas Leeder
Douglas Leeder

Reputation: 53285

cd website
find . -type f -print | sort | xargs sha1sum

will produce a list of checksums for the files. You can then diff those to see if there are any missing/added/different files.

Upvotes: 11

driAn
driAn

Reputation: 3335

Try diffing your directory recursively. You'll get a nice summary if something is different in one of the directories.

Upvotes: 0

schnaader
schnaader

Reputation: 49731

Make checksums for all files, for example using md5sum. If they're all the same for all the files and no file is missing, everything's OK.

Upvotes: 1

Related Questions