Reputation: 63
I'd like to know how to compare two directories (not recursively) only by filename (ignore extension) to get the difference. For example, if I have list A and B, I want to know what is present in A and not in B.
I am currently processing some images. In one directory I have source files with the extension .tiff and in the other directory I have processed files with the extension .png. The filenames are the same in both directories, but only the extension differs (ex. one file is named foo.tiff in directory A, and it is named foo.png in directory B).
I'm trying to find which files have not yet been processed.
Thanks!
Upvotes: 5
Views: 8399
Reputation: 1455
if I understand you correctly you nedd following script:
#/bin/bash
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
folder1="/home/vagrant/1 b"
folder2="/home/vagrant/2 a"
ext1="tiff"
ext2="png"
for fullfile in ${folder1}/*.$ext1
do
#echo "$fullfile fullfile"
filename=$(basename "$fullfile")
#echo "$filename file"
extension="${filename##*.}"
#echo "$extension ext"
cleanfilename="${filename%.*}"
#echo "$cleanfilename base"
if ! [ -a "${folder2}/$cleanfilename.$ext2" ]
then
echo $fullfile
fi
done
IFS=$SAVEIFS
it It shows files present in first folder but absent in second. like this:
admin$ mkdir 1
admin$ mkdir 2
admin$ touch 1/1.tiff
admin$ touch 1/2.tiff
admin$ touch 1/3.tiff
admin$ touch 2/1.png
admin$ touch 2/2.png
admin$ vim diff.sh
admin$ chmod +x diff.sh
admin$ ./diff.sh
/Users/admin/1/3.tiff
Upvotes: 0
Reputation: 113834
First let's create a helper function:
getfiles() { find "$1" -maxdepth 1 -type f -exec bash -c 'for f in "$@"; do basename "${f%.*}"; done' "" {} + | sort; }
If you run getfiles dirname
, it will return a sorted list of files in that directory without the directory's name and without any extension. The -maxdepth 1
option means that find
will not search recursively.
Now, let's compare the files directories A
and B
:
diff <(getfiles A) <(getfiles B)
The output is in the usual diff
format. As any of diff's normal options can be used, the output format is quite flexible.
Here is a sample directory A
and B
, each having one file that the other doesn't have:
$ ls */
A/:
bar.png foo.png qux.png
B/:
bar.tiff baz.tiff foo.tiff
The output:
$ diff <(getfiles A) <(getfiles B)
1a2
> baz
3d3
< qux
The output correctly identifies (a) that B
has a baz
file that is not present in A
and (b) that A
has a qux
file that is not present in B
.
Suppose that we just want to do a one-sided comparison and find what files in B
are not also in A
. In this case, grep
can be used:
$ grep -vxFf <(getfiles A) <(getfiles B)
baz
The options used here are:
-v
tells grep
to exclude matching lines
-x
tells grep
to match whole lines only
-F
tells grep
that the patterns are fixed strings, not regular expressions.
-f
tells grep
to get the list of patterns from file or, in this case, the file-like object <(getfiles A)
.Consider these files:
$ ls */
A A/:
1 bar.png 1 foo.png 1 qux.png
B B/:
1 bar.tiff 1 baz.tiff 1 foo.tiff
The output:
$ diff <(getfiles 'A A') <(getfiles 'B B')
1a2
> 1 baz
3d3
< 1 qux
Or,
$ grep -vxFf <(getfiles 'A A') <(getfiles 'B B')
1 baz
If any of your file names have newline characters in them, this will give incorrect results. At least for the grep
form, this could be extended to the more general case.
Upvotes: 8
Reputation: 108
Hope this helps.
-q Report only whether the files differ, not the details of the differences.
-r When comparing directories, recursively compare any subdirectories found.
diff -qr /dir1 /dir2
Upvotes: 3