Reputation: 1338

How to find duplicate lines in a file?

I have an input file with foillowing data:

line1
line2
line3
begin
line5
line6
line7
end
line9
line1
line3

I am trying to find all the duplicate lines , I tried

sort filename | uniq -c

but does not seem to be working for me :

It gives me :

  1 begin
  1 end
  1 line1
  1 line1
  1 line2
  1 line3
  1 line3
  1 line5
  1 line6
  1 line7
  1 line9

the question may seem duplicate as Find duplicate lines in a file and count how many time each line was duplicated? but nature of input data is different .

Please suggest .

Upvotes: 6

Answers (4)

slyfox1186

Reputation: 350

Pass the file name as the first argument to this script.

Example: find-dupes.sh name.ext

#!/usr/bin/env bash

# Check if a file name is provided
if [ $# -eq 0 ]; then
    echo "Usage: $0 [file]"
    exit 1
fi

# File to check for duplicates
file="$1"

# Check if the file exists
if [ ! -f "$file" ]; then
    echo "Error: File not found."
    exit 1
fi

# Finding duplicates
duplicates=$(sort "$file" | uniq -d)

if [ -z "$duplicates" ]; then
    printf "\n%s\n" "No duplicates were found in $file."
else
    printf "\n%s\n\n" "Duplicate lines in $file:"
    echo "$duplicates"
fi

Upvotes: 0

RARE Kpop Manifesto

Reputation: 2915

you'll have to modify the standard de-dupe code just a tiny bit to account for this:

if you want unique copy of the duplicates, then it's very much same idea:

  {m,g}awk 'NF~ __[$_]++' FS='^$'
  {m,g}awk '__[$_]++==!_'

If you want every copy printed for duplicates, then whenever the condition yields true for the first time, print 2 copies of it, plus print new matches along the way.

Usually it's waaaaaaaaay faster to first de-dupe, then sort, instead of the other way around.

Upvotes: 0