Reputation: 573
I have a file the contains
apple
apple
banana
orange
apple
orange
I want a script that finds the duplicates apple and orange and tells the user that the following : apple and orange are repeated. I tried
nawk '!x[$1]++' FS="," filename
to find repeated item so how can i print them out in unix bash ?
Upvotes: 5
Views: 6173
Reputation: 691
+1 for devnul's answer. However, if the file contains spaces instead of newlines as delimiter. then the following would work.
tr [:blank:] "\n" < filename | sort | uniq -d
Upvotes: 4
Reputation: 157947
Update:
The question has been changed significantly. Formerly, when answering this, the input file should look like:
apple apple banana orange apple orange
banana orange apple
...
However, the solution will work anyway, but might be a little bit too complicated for this special use case.
The following awk script will do the job:
awk '{i=1;while(i <= NF){a[$(i++)]++}}END{for(i in a){if(a[i]>1){print i,a[i]}}}' your.file
Output:
apple 3
orange 2
It is more understandable in a form like this:
#!/usr/bin/awk
{
i=1;
# iterate through every field
while(i <= NF) {
a[$(i++)]++; # count occurrences of every field
}
}
# after all input lines have been read ...
END {
for(i in a) {
# ... print those fields which occurred more than 1 time
if(a[i] > 1) {
print i,a[i];
}
}
}
Then make the file executable and execute it passing the input file name to it:
chmod +x script.awk
./script.awk your.file
Upvotes: 1
Reputation: 123458
In order to print the duplicate lines, you can say:
$ sort filename | uniq -d
apple
orange
If you want to print the count as well, supply the -c
option to uniq
:
$ sort filename | uniq -dc
3 apple
2 orange
Upvotes: 11