Reputation: 960

Remove all line that contain duplicate entry in column 2

I have a large file of two columns, and I want to remove the line on the basis of duplicate entries in column 2. I want to delete both duplicate entries.

I tried:

awk '!seen[$2]++' filename

But it only deletes a single duplicate.

Input file example:

Expected output:

4  10
1  6

Upvotes: 5

Answers (4)

James Brown

Reputation: 37464

Another with sort, uniq and grep:

$ grep -v -f <(sort -k2n file | uniq -f 1 -D) file
4  10
1  6

Explained: sort sorts the file on second field:

uniq -f 1 -D skips the first (run of blanks separated) field and prints only duplicated lines:

1  3
2  3
5  3

That list is an exclude list for grep.

Upvotes: 0

Ed Morton

Reputation: 204648

$ awk 'NR==FNR{cnt[$2]++; next} cnt[$2]==1' file file
4  10
1  6

or if you can't read the input twice (e.g. if it's coming from a pipe) then:

$ awk '{rec[NR]=$0; key[NR]=$2; cnt[$2]++} END{for (i=1; i<=NR; i++) if (cnt[key[i]] == 1) print rec[i]}' file
4  10
1  6

Upvotes: 2

Thor

Reputation: 47239

With coreutils and grep:

# Sort on the second column
<infile sort -k2,2n | 

# Count number of repeated fields in the second column
uniq -f1 -c         | 

# Remove fields that are repeated
grep -E '^ +1 +'    | 

# Squeeze white-space
tr -s ' '           | 

# Remove repeat count
cut -d' ' -f3-

Output:

1 6
4 10

Upvotes: 0

RavinderSingh13

Reputation: 133770

Could you please try following.

awk '{seen[$2]++;value[$2]=$0} END{for(i in seen){if(seen[i]==1){print value[i]}}}' Input_file

Upvotes: 3

Remove all line that contain duplicate entry in column 2

Answers (4)

Related Questions