Reputation: 1328
I have an input file with foillowing data:
line1
line2
line3
begin
line5
line6
line7
end
line9
line1
line3
I am trying to find all the duplicate lines , I tried
sort filename | uniq -c
but does not seem to be working for me :
It gives me :
1 begin
1 end
1 line1
1 line1
1 line2
1 line3
1 line3
1 line5
1 line6
1 line7
1 line9
the question may seem duplicate as Find duplicate lines in a file and count how many time each line was duplicated? but nature of input data is different .
Please suggest .
Upvotes: 6
Views: 8834
Reputation: 350
Pass the file name as the first argument to this script.
Example: find-dupes.sh name.ext
#!/usr/bin/env bash
# Check if a file name is provided
if [ $# -eq 0 ]; then
echo "Usage: $0 [file]"
exit 1
fi
# File to check for duplicates
file="$1"
# Check if the file exists
if [ ! -f "$file" ]; then
echo "Error: File not found."
exit 1
fi
# Finding duplicates
duplicates=$(sort "$file" | uniq -d)
if [ -z "$duplicates" ]; then
printf "\n%s\n" "No duplicates were found in $file."
else
printf "\n%s\n\n" "Duplicate lines in $file:"
echo "$duplicates"
fi
Upvotes: 0
Reputation: 2807
you'll have to modify the standard de-dupe code just a tiny bit to account for this:
if you want unique copy of the duplicates, then it's very much same idea:
{m,g}awk 'NF~ __[$_]++' FS='^$'
{m,g}awk '__[$_]++==!_'
If you want every copy printed for duplicates, then whenever the condition yields true for the first time, print 2 copies of it, plus print new matches along the way.
Usually it's waaaaaaaaay faster to first de-dupe
, then sort
, instead of the other way around.
Upvotes: 0