john stamos
john stamos

Reputation: 1124

Recursively find directories with identical sets of filenames

I'm looking for a way to find if any directory from the current directory onward has any duplicate directories, recursively.
i.e.

/user/guy/textfile1.txt
/user/guy/textfile2.txt
/user/guy/textfile3.txt
/user/girl/textfile1.txt
/user/girl/textfile2.txt
/user/girl/textfile3.txt
/user/fella/textfile1.txt
/user/fella/textfile2.txt
/user/fella/textfile3.txt
/user/fella/textfile4.txt
/user/rudiger/rudy/textfile1.txt
/user/rudiger/rudy/textfile2.txt
/user/rudiger/rudy/textfile3.txt
/user/julian/rudy/textfile1.txt
/user/julian/rudy/textfile2.txt
/user/julian/rudy/textfile3.txt

/girl and /guy /rudy would be duplicate directories and so would /julian and rudiger. We would also be checking if any other file contains the same files/dirs as "user". As we are running the script from "user" as the current directory we want to check the current directory as well for any duplicates down the line.

My current code which works... but it's non recursive which is an issue.

for d in */ ; do
  for d2 in */ ; do
    if [ "$d" != "$d2" ] ; then 
        string1="$(ls "$d2")"
        string2="$(ls "$d")"
        if [ "$string1" == "$string2" ] ; then
            echo "The directories $d and $d2 are the same"
        fi
    fi
  done
done

Upvotes: 0

Views: 71

Answers (1)

Charles Duffy
Charles Duffy

Reputation: 295373

#!/usr/bin/env bash
#              ^^^^- must be bash, not /bin/sh, and version 4.0 or newer.

# associative array mapping hash to first directory seen w/ same
declare -A hashes=( )

# sha256sum requiring only openssl, vs GNU coreutils
sha256sum() { openssl dgst -sha256 -r | sed -e 's@[[:space:]].*@@'; }

while IFS= read -r -d '' dirname; do
  hash=$(cd "$dirname" && printf '%s\0' * | sha256sum)
  if [[ ${hashes[$hash]} ]]; then
    echo "COLLISION: Directory $dirname has same filenames as ${hashes[$hash]}"
  else
    hashes[$hash]=$dirname
  fi
done < <(find . -type d -print0)

Upvotes: 2

Related Questions