Reputation: 297
I have two separate folder directories, which mostly contain the same files, but the directory structure is completely different between the two folders. The filenames do not correspond either
So, for example:
FOLDER 1
--- Subfolder A
-file1
-file2
--- Subfolder B
-file3
-file4
FOLDER 2
--- Subfolder C
-Subfolder C1
-file5
-file6
-file7
-Subfolder C2
-file8
-file9
Let's suppose that file1=file5
, file2=file6
, file3=file7
, file4=file8
And file9
is unmatched.
Is there some combination of options to the diff
command that will identify the matches? Doing a recursive diff
with -r
doesn't seem to do the job.
Upvotes: 1
Views: 205
Reputation: 8521
This is a way to get the different and/or identical files with find
and xargs
:
find FOLDER1 -type f -print0 |
xargs -0 -I % find FOLDER2 -type f -exec diff -qs --from-file="%" '{}' \+
Sample output:
Files FOLDER1/SubfolderB/file3 and FOLDER2/SubfolderC/SubfolderC1/file5 differ
Files FOLDER1/SubfolderB/file3 and FOLDER2/SubfolderC/SubfolderC1/file7 are identical
So, you can filter the ones you want with grep
(see example).
Notice this solution supports filenames with spaces and special characters (e.g.: newlines) embedded, so you don't have to worry about it
For every file in FOLDER1
(find FOLDER1 -type f -print0
), executes:
find FOLDER2 -type f -exec diff -qs --from-file="%" '{}' \+
That line calls find
again to get all the files in FOLDER2
and executes the following (processed):
diff -qs --from-file="<a file from FOLDER1>" <all the files from FOLDER2>
From man diff
:
--from-file=FILE1
Compare FILE1 to all operands. FILE1 can be a directory.
This is the directory tree and the file content:
$ find FOLDER1 FOLDER2 -type f -exec sh -c 'echo "$0": && cat "$0"' '{}' \;
FOLDER1/SubfolderA/file1:
1=5
FOLDER1/SubfolderA/file2:
2=6
FOLDER1/SubfolderB/file3:
3=7
FOLDER1/SubfolderB/file4:
4=8
FOLDER2/SubfolderC/SubfolderC1/file5:
1=5
FOLDER2/SubfolderC/SubfolderC1/file6:
2=6
FOLDER2/SubfolderC/SubfolderC1/file7:
3=7
FOLDER2/SubfolderC/SubfolderC2/file8:
4=8
FOLDER2/SubfolderC/SubfolderC2/file9:
anything
And this is the command (pipeline) getting just the identical ones:
$ find FOLDER1 -type f -print0 |
> xargs -0 -I % find FOLDER2 -type f -exec diff -qs --from-file="%" '{}' \+ |
> grep "identical$"
Files FOLDER1/SubfolderA/file1 and FOLDER2/SubfolderC/SubfolderC1/file5 are identical
Files FOLDER1/SubfolderA/file2 and FOLDER2/SubfolderC/SubfolderC1/file6 are identical
Files FOLDER1/SubfolderB/file3 and FOLDER2/SubfolderC/SubfolderC1/file7 are identical
Files FOLDER1/SubfolderB/file4 and FOLDER2/SubfolderC/SubfolderC2/file8 are identical
bash
's Process Substitution and ArraysIf you're using bash
, you can first save all the FOLDER2
filenames in an array to avoid calling find
for each file in FOLDER1
:
# first of all, we save all the FOLDER2 filenames (recursively) in an array
while read -d $'\0' file; do
folder2_files=("${folder2_files[@]}" "$file")
done < <(find FOLDER2 -type f -print0)
# now we compare each file in FOLDER1 with the files in the array
find FOLDER1 -type f -exec diff -qs --from-file='{}' "${folder2_files[@]}" \; |
grep "identical$"
Upvotes: 1
Reputation: 8286
Create a temporary Git repository. Add the first directory tree to it, and commit.
Remove all the files and add the second directory tree to it. Do the second commit.
The git diff between those two commits will turn on rename detection and you will probably see something more englightening.
Upvotes: 0