Reputation:
I am trying to found duplicates in a list. Right now I am searching for a list of files with specific file extensions and storing these files in a variable called 'files'.
For each file in files I am formatting these so only have the filename.
I then want to check this list for duplicates but I can't get my head around it.
files=$(find /root/123 -type f \( -iname "*.txt" -o -iname "*.bat" \))
for file in $files; do
formatted=$(echo ${file##*/})
unique=$(echo $formatted | sort | uniq -c)
done
echo $unique
Any help is much appreciated!!
Upvotes: 0
Views: 294
Reputation: 98921
Find duplicates in variable
I guess you don't need to reinvent the wheel, simply use fdupes ot fslint
Depending on your system, you can install it by using:
yum -y install fdupes
or
apt-get install fdupes
Usage of fdupes
is pretty straight forward:
fdupes /path/to/dir
If you just need the .txt
files, you can pipe the result to grep
, i.e.:
fdupes /path/to/dir | grep .txt
Upvotes: 2
Reputation: 80931
$files
is not an array. It is a string.
You are splitting it on whitespace. This is not safe for filenames with spaces.
You are also globbing. This isn't safe for filenames with globbing metacharacters in the names.
See Bash FAQ 001 for how to safely operate over data line-by-line. Also see Don't read lines with for
.
You can also get find
to spit out arbitrarily formatted output with the -printf
argument. (i.e. -printf %f
will print out just the file name (no path information).)
You don't need echo
for that variable assignment. (i.e. formatted=${file##*/}
works just fine.)
$formatted
contains a single filename. You can't really sort
or uniq
a single item.
Putting all the above together and assuming that you want to detect duplicates by suffix-less name (and not file contents) then...
If you aren't worried about filenames with newlines then you can just use this:
find /root/123 -type f \( -iname "*.txt" -o -iname "*.bat" \) -printf %f | sort | uniq -c
If you are worried about them then you need to read the lines manually (something like this for bash 4+):
declare -A files
while IFS= read -r -d '' file; do
((files["$file"]+=1))
done <(find /root/123 -type f \( -iname "*.txt" -o -iname "*.bat" \) -printf '%f\0')
declare -p files
Upvotes: 1