Reputation:
I need to make a shell script that "lists all identical sub-directories (recursively) under the current working directory."
I'm new to shell scripts. How do I approach this?
To me, this means:
md5sum
(?) and continuing to do so for each subdirectory within the directories (recursively?)It would have been the most complicated program I'd have ever written, so I assume I'm just not aware of some shell command to do most of it for me?
I.e., how should I have approached this? All the other parts were about googling until I discovered the shell command that did it 90% of it for me.
(For a previous assignment that I wasn't able to finish, took a zero on this part, need to know how to approach it in the future.)
Upvotes: 0
Views: 493
Reputation: 37404
Maybe something like this:
$ find -type d -exec sh -c "echo -n {}\ ; sh -c \"ls -s {}; basename {}\"|md5sum " \; | awk '$2 in a {print "Match:"; print a[$2], $1; next} a[$2]=$1{next}'
Match:
./bar/foo ./foo
find
all directories: find -type d
, output:
.
./bar
./bar/foo
./foo
ls -s {}; basename {}
will print the simplified directory listing and the basename of the directory listed, for example for directory foo
: ls -s foo; basename foo
total 0
0 test
foo
Those will cover the files in each dir, their sizes and the dir name. That output will be sent to md5sum
and that along the dir:
. 674e2573b49826d4e32dfe81d9680369 -
./bar 4c2d588c5fa9781ad63ad8e86e575e01 -
./bar/foo ff8d1569685be86366f18ea89851db35 -
./foo ff8d1569685be86366f18ea89851db35 -
will be sent to awk
:
$2 in a { # hash as array key
print "Match:" # separate hits in output
print a[$2], $1 # print matching dirscompared to
next # next record
}
a[$2]=$1 {next} # only first match is stored and
Test dir structure:
$ mkdir -p test/foo; mkdir -p test/bar/foo; touch test/foo/test; touch test/bar/foo/test
$ find test/
test/
test/bar
test/bar/foo
test/bar/foo/test # touch test
test/foo
test/foo/test # touch test
Upvotes: 1
Reputation: 59426
I'd be surprised to hear that there is a special Unix tool or special usage of a standard Unix tool to do exactly what you describe. Maybe your understanding of the task is more complex than what the task giver intended. Maybe with "identical" something concerning linking was meant. Normally, hardlinking directories is not allowed, so this probably also isn't meant.
Anyway, I'd approach this task by creating checksums for all nodes in your tree, i. e. recursively:
After creating checksums for all elements, search for duplicates (by sorting a list of all and searching for consecutive lines).
A quick solution could be like this:
#!/bin/bash
dirchecksum() {
if [ -f "$1" ]
then
checksum=$(md5sum < "$1")
elif [ -d "$1" ]
then
checksum=$(
find "$1" -maxdepth 1 -printf "%P " \( ! -path "$1" \) \
-exec bash -c "dirchecksum {}" \; |
md5sum
)
fi
echo "$checksum"
echo "$checksum $1" 1>&3
}
export -f dirchecksum
list=$(dirchecksum "$1" 3>&1 1>/dev/null)
lastChecksum=''
while read checksum _ path
do
if [ "$checksum" = "$lastChecksum" ]
then
echo "duplicate found: $path = $lastPath"
fi
lastChecksum=$checksum
lastPath=$path
done < <(sort <<< "$list")
This script uses two tricks which might not be clear, so I mention them:
find -exec
one can export -f
it (done below it) and then call bash -c ...
to execute it.The sorting at the end uses the list given out via fd 3 as input.
Upvotes: 1