Reputation: 17
I want to be able to delete duplicate files and at the same time create a symbolic link to the removed duplicate lines.So far I can display the duplicate files ,the problem is removal and deleting.Since I want to retain a copy
find "$@" -type f -print0 | xargs -0 -n1 md5sum | sort --key=1,32 | uniq -w
32 -d --all-repeated=separate
Output:
1463b527b1e7ed9ed8ef6aa953e9ee81 ./tope5final
1463b527b1e7ed9ed8ef6aa953e9ee81 ./Tests/tope5
2a6dfec6f96c20f2c2d47f6b07e4eb2f ./tope3final
2a6dfec6f96c20f2c2d47f6b07e4eb2f ./Tests/tope3
5baa4812f4a0838dbc283475feda542a ./tope1bfinal
5baa4812f4a0838dbc283475feda542a ./Tests/tope1b
69d7799197049b64f8675ed4500df76c ./tope3afinal
69d7799197049b64f8675ed4500df76c ./Tests/tope3a
945fe30c545fc0d7dc2d1cb279cf9c04 ./Tests/butter6
945fe30c545fc0d7dc2d1cb279cf9c04 ./Tests/tope6
98340fa2af27c79da7efb75ae7c01ac6 ./tope2cfinal
98340fa2af27c79da7efb75ae7c01ac6 ./Tests/tope2c
d15df73b8eaf1cd237ce96d58dc18041 ./tope1afinal
d15df73b8eaf1cd237ce96d58dc18041 ./Tests/tope1a
d5ce8f291a81c1e025d63885297d4b56 ./tope4final
d5ce8f291a81c1e025d63885297d4b56 ./Tests/tope4
ebde372904d6d2d3b73d2baf9ac16547 ./tope1cfinal
ebde372904d6d2d3b73d2baf9ac16547 ./Tests/tope1c
In this case for example I want to delete ./tope1cfinal and remain with ./Tests/tope1c. After deleting I also want to create a symbolic link with name /tope1cfinal pointing to /Tests/tope1c.
Upvotes: -1
Views: 268
Reputation: 185005
You need to use de-duplication
dedicated tools, like jdupes
:
jdupes -d dir1 dir2
Upvotes: -1
Reputation: 46813
One possibility: create an associative array, the keys of which are the md5sum, the fields of which are the corresponding first file found (the one that won't be deleted). Each time an md5sum is found in this associative array, the file will be deleted and a corresponding link to the corresponding key will be created (after checking that the file to delete isn't the original file). Takes the directories to search as arguments; with no arguments the search is performed inside current directory.
#!/bin/bash
shopt -s globstar nullglob
(($#==0)) && set .
declare -A md5sum=() || exit 1;
while(($#)); do
[[ $1 ]] || continue
for file in "$1"/**/*; do
[[ -f $file ]] || continue
h=$(md5sum < "$file") || continue
read h _ <<< "$h" # This line is optional: to remove the hyphen in the md5sm
if [[ ${md5sum[$h]} ]]; then
# already seen this md5sum
[[ "$file" -ef "${md5sum[$h]}" ]] && continue # prevent unwanted removal!
rm -- "$file" || continue
ln -rs -- "${md5sum[$h]}" "$file"
else
# first time seeing this file
md5sum[$h]=$file
fi
done
shift
done
(Untested, use at your own risks!)
Upvotes: 1