user2891460
user2891460

Reputation:

Find duplicates in variable

I am trying to found duplicates in a list. Right now I am searching for a list of files with specific file extensions and storing these files in a variable called 'files'.

For each file in files I am formatting these so only have the filename.

I then want to check this list for duplicates but I can't get my head around it.

files=$(find /root/123 -type f \( -iname "*.txt" -o -iname "*.bat" \))

for file in $files; do
   formatted=$(echo ${file##*/})
   unique=$(echo $formatted | sort | uniq -c)
done

echo $unique

Any help is much appreciated!!

Upvotes: 0

Views: 294

Answers (2)

Pedro Lobito
Pedro Lobito

Reputation: 98921

Find duplicates in variable

I guess you don't need to reinvent the wheel, simply use fdupes ot fslint

Depending on your system, you can install it by using:

yum -y install fdupes

or

apt-get install fdupes

Usage of fdupes is pretty straight forward:

fdupes /path/to/dir

If you just need the .txt files, you can pipe the result to grep, i.e.:

fdupes /path/to/dir | grep .txt

Upvotes: 2

Etan Reisner
Etan Reisner

Reputation: 80931

$files is not an array. It is a string.

You are splitting it on whitespace. This is not safe for filenames with spaces.

You are also globbing. This isn't safe for filenames with globbing metacharacters in the names.

See Bash FAQ 001 for how to safely operate over data line-by-line. Also see Don't read lines with for.

You can also get find to spit out arbitrarily formatted output with the -printf argument. (i.e. -printf %f will print out just the file name (no path information).)

You don't need echo for that variable assignment. (i.e. formatted=${file##*/} works just fine.)

$formatted contains a single filename. You can't really sort or uniq a single item.

Putting all the above together and assuming that you want to detect duplicates by suffix-less name (and not file contents) then...

If you aren't worried about filenames with newlines then you can just use this:

find /root/123 -type f \( -iname "*.txt" -o -iname "*.bat" \) -printf %f | sort | uniq -c

If you are worried about them then you need to read the lines manually (something like this for bash 4+):

declare -A files
while IFS= read -r -d '' file; do
    ((files["$file"]+=1))
done <(find /root/123 -type f \( -iname "*.txt" -o -iname "*.bat" \) -printf '%f\0')
declare -p files

Upvotes: 1

Related Questions