Reputation: 1
I've started to learn bash scripting only a few moths for now in need of personalized scripts to help smoother all tasks managing files on old HDD's and so on. No school, no books, just internet and trial&error. I came across with strange behaviour of 'uniq' command.
Regardless tying to filter only the duplicate files, I've managed to make 'uniq' command to displays some unique files along with the duplicates.
As I use functions, I try to assign all needed output to variables. Here are the lines to search duplicates by size and name :
dup_sn_o1=$(find "${folder1_o1}" "${folder2_o1}" -mindepth 1 -path '*/lost+found' -prune -o -not -empty -iname "*$dup_sn_p1*" -printf "'%p'\t%s\n" | sort -t$'\t' -k2n );
dup_sn_o1v=$(echo -e "${dup_sn_o1}" | tr ' ' '_' | awk -v re=${re} -v th_fg=${th_fg} -v ktb=${ktb} -v hr_fg=${hr_fg} -v cy_fg=${cy_fg} 'BEGIN{ORS="\n"; FS=OFS="\t";} {n=split($1, a, /[/]/); for(i=1;i<n;i++){printf a[i]=(th_fg a[i]"/"); a[n]=(cy_fg a[n] ktb)}; printf a[n]"\t"; print $2}' | uniq -f1 -D --all-repeated=separate | tr '_' ' ');
dup_sn_o2=$(echo -e "${dup_sn_o1}" | sed "s/\(.*' \)/'\t/"| cut -d$'\t' -f 1 );
from where I get the list of files chosen by name and searched for duplicates by size (dup_sn_o2) an use it in an other function to continue clarifying with 'md5sum' :
dup_md5_o1=$(echo -e $dup_sn_o2 | xargs md5sum 2>/dev/null ) ;
dup_md5_o2=$(echo -e "${dup_md5_o1}" | sed 's/ /_/g; s/\(__\)/\t/g' | sort $s_o | uniq -w32 --all-repeated=separate );
but somehow it displays some unique files along with the duplicates. Occurance of the lines not having any duplicates of all duplicate files displayed is about 10% .
Where am i making a mistake.
And to clarify, I've renamed all the filenames containing spaces in the directories and replaced spaces with dots ( . ), so it's not about the filenames containing spaces.
I've put here the original code as in my script with all the colour codes and whistles as maybe the reason for the misbehaviour lies in there.
UPDATE: With replaceing 'uniq' with 'awk' everything in the script behaves as expected,
awk_dup_o1=$(find "${folder1_o1}" -not -empty -path '*/lost+found' -prune -o -type f -printf "%s\n" | awk '_[$0]++==1' | sort -n | xargs -I{} -n1 find "${ots1_o1}" -type f -size {}c -print0 | xargs -0 md5sum;)
and to separarte duplicates into groups :
awk_sep_o1=$(echo -e "${awk_dup_o1}" | awk 'last!=""&&last!=$1{print""}{print;last=$1}';)
I may as well get along with my miserable life and use 'awk' instead of 'uniq' ('cos it works), but the question remains : What causes 'uniq' to behave like it does.
Upvotes: 0
Views: 38