t28292
t28292

Reputation: 573

finding duplicates in a field and printing them in unix bash

I have a file the contains

apple
apple
banana
orange
apple
orange

I want a script that finds the duplicates apple and orange and tells the user that the following : apple and orange are repeated. I tried

nawk '!x[$1]++' FS="," filename

to find repeated item so how can i print them out in unix bash ?

Upvotes: 5

Views: 6173

Answers (3)

Varun
Varun

Reputation: 691

+1 for devnul's answer. However, if the file contains spaces instead of newlines as delimiter. then the following would work.

tr [:blank:] "\n" < filename | sort | uniq -d

Upvotes: 4

hek2mgl
hek2mgl

Reputation: 157947

Update:

The question has been changed significantly. Formerly, when answering this, the input file should look like:

apple apple banana orange apple orange
banana orange apple
...

However, the solution will work anyway, but might be a little bit too complicated for this special use case.


The following awk script will do the job:

awk '{i=1;while(i <= NF){a[$(i++)]++}}END{for(i in a){if(a[i]>1){print i,a[i]}}}' your.file

Output:

apple 3
orange 2

It is more understandable in a form like this:

#!/usr/bin/awk

{
  i=1;
  # iterate through every field
  while(i <= NF) {
    a[$(i++)]++; # count occurrences of every field
  }
}

# after all input lines have been read ...
END {
  for(i in a) {
    # ... print those fields which occurred more than 1 time
    if(a[i] > 1) {
      print i,a[i];
    }
  }
}

Then make the file executable and execute it passing the input file name to it:

chmod +x script.awk
./script.awk your.file  

Upvotes: 1

devnull
devnull

Reputation: 123458

In order to print the duplicate lines, you can say:

$ sort filename | uniq -d
apple
orange

If you want to print the count as well, supply the -c option to uniq:

$ sort filename | uniq -dc
      3 apple
      2 orange

Upvotes: 11

Related Questions