Reputation: 960
I have a large file of two columns, and I want to remove the line on the basis of duplicate entries in column 2. I want to delete both duplicate entries.
I tried:
awk '!seen[$2]++' filename
But it only deletes a single duplicate.
Input file example:
1 3
2 3
4 10
1 6
5 3
Expected output:
4 10
1 6
Upvotes: 5
Views: 197
Reputation: 37404
Another with sort
, uniq
and grep
:
$ grep -v -f <(sort -k2n file | uniq -f 1 -D) file
4 10
1 6
Explained: sort
sorts the file
on second field:
1 3
2 3
5 3
1 6
4 10
uniq -f 1 -D
skips the first (run of blanks separated) field and prints only duplicated lines:
1 3
2 3
5 3
That list is an exclude list for grep
.
Upvotes: 0
Reputation: 203502
$ awk 'NR==FNR{cnt[$2]++; next} cnt[$2]==1' file file
4 10
1 6
or if you can't read the input twice (e.g. if it's coming from a pipe) then:
$ awk '{rec[NR]=$0; key[NR]=$2; cnt[$2]++} END{for (i=1; i<=NR; i++) if (cnt[key[i]] == 1) print rec[i]}' file
4 10
1 6
Upvotes: 2
Reputation: 47099
With coreutils
and grep
:
# Sort on the second column
<infile sort -k2,2n |
# Count number of repeated fields in the second column
uniq -f1 -c |
# Remove fields that are repeated
grep -E '^ +1 +' |
# Squeeze white-space
tr -s ' ' |
# Remove repeat count
cut -d' ' -f3-
Output:
1 6
4 10
Upvotes: 0
Reputation: 133518
Could you please try following.
awk '{seen[$2]++;value[$2]=$0} END{for(i in seen){if(seen[i]==1){print value[i]}}}' Input_file
Upvotes: 3