SauceCode
SauceCode

Reputation: 297

Using 'diff' with mismatched directories and filenames

I have two separate folder directories, which mostly contain the same files, but the directory structure is completely different between the two folders. The filenames do not correspond either

So, for example:

FOLDER 1
--- Subfolder A
    -file1
    -file2
--- Subfolder B
    -file3
    -file4

FOLDER 2
--- Subfolder C
    -Subfolder C1
        -file5
        -file6
        -file7
    -Subfolder C2
        -file8
        -file9

Let's suppose that file1=file5, file2=file6, file3=file7, file4=file8 And file9 is unmatched.

Is there some combination of options to the diff command that will identify the matches? Doing a recursive diff with -r doesn't seem to do the job.

Upvotes: 1

Views: 205

Answers (2)

whoan
whoan

Reputation: 8521

This is a way to get the different and/or identical files with find and xargs:

find FOLDER1 -type f -print0 |
xargs -0 -I % find FOLDER2 -type f -exec diff -qs --from-file="%" '{}' \+

Sample output:

Files FOLDER1/SubfolderB/file3 and FOLDER2/SubfolderC/SubfolderC1/file5 differ
Files FOLDER1/SubfolderB/file3 and FOLDER2/SubfolderC/SubfolderC1/file7 are identical

So, you can filter the ones you want with grep (see example).

Notice this solution supports filenames with spaces and special characters (e.g.: newlines) embedded, so you don't have to worry about it

Explanation

For every file in FOLDER1 (find FOLDER1 -type f -print0), executes:

find FOLDER2 -type f -exec diff -qs --from-file="%" '{}' \+

That line calls find again to get all the files in FOLDER2 and executes the following (processed):

diff -qs --from-file="<a file from FOLDER1>" <all the files from FOLDER2>

From man diff:

--from-file=FILE1
Compare FILE1 to all operands. FILE1 can be a directory.

Example

This is the directory tree and the file content:

$ find FOLDER1 FOLDER2 -type f -exec sh -c 'echo "$0": &&  cat "$0"' '{}' \;
FOLDER1/SubfolderA/file1:
1=5
FOLDER1/SubfolderA/file2:
2=6
FOLDER1/SubfolderB/file3:
3=7
FOLDER1/SubfolderB/file4:
4=8
FOLDER2/SubfolderC/SubfolderC1/file5:
1=5
FOLDER2/SubfolderC/SubfolderC1/file6:
2=6
FOLDER2/SubfolderC/SubfolderC1/file7:
3=7
FOLDER2/SubfolderC/SubfolderC2/file8:
4=8
FOLDER2/SubfolderC/SubfolderC2/file9:
anything

And this is the command (pipeline) getting just the identical ones:

$ find FOLDER1 -type f -print0 |
> xargs -0 -I % find FOLDER2 -type f -exec diff -qs --from-file="%" '{}' \+ |
> grep "identical$"
Files FOLDER1/SubfolderA/file1 and FOLDER2/SubfolderC/SubfolderC1/file5 are identical
Files FOLDER1/SubfolderA/file2 and FOLDER2/SubfolderC/SubfolderC1/file6 are identical
Files FOLDER1/SubfolderB/file3 and FOLDER2/SubfolderC/SubfolderC1/file7 are identical
Files FOLDER1/SubfolderB/file4 and FOLDER2/SubfolderC/SubfolderC2/file8 are identical

Enhanced solution with bash's Process Substitution and Arrays

If you're using bash, you can first save all the FOLDER2 filenames in an array to avoid calling find for each file in FOLDER1:

# first of all, we save all the FOLDER2 filenames (recursively) in an array
while read -d $'\0' file; do
    folder2_files=("${folder2_files[@]}" "$file")
done < <(find FOLDER2 -type f -print0)
# now we compare each file in FOLDER1 with the files in the array
find FOLDER1 -type f -exec diff -qs --from-file='{}' "${folder2_files[@]}" \; |
grep "identical$"

Upvotes: 1

squadette
squadette

Reputation: 8286

Create a temporary Git repository. Add the first directory tree to it, and commit.

Remove all the files and add the second directory tree to it. Do the second commit.

The git diff between those two commits will turn on rename detection and you will probably see something more englightening.

Upvotes: 0

Related Questions