Reputation: 263
I have a file having strings in 3 columns as below.
a b x
a b y
a b z
a c x
a d y
I want to extract all the lines having same second column but different third column. The output I am expecting for the above example is
a b x
a b y
a b z
I tried uniq -f2 and sort -u -k2, But it isn't working as I expect. Any suggestions please.
Upvotes: 1
Views: 70
Reputation: 204721
awk '
seen[$2]++ {
if (!seen[$2,$3]++) {
printf "%s%s\n", first[$2], $0
}
delete first[$2]
next
}
{ first[$2] = $0 ORS }
' file
a b x
a b y
a b z
Note that the above will work in any awk, for any values in your input file, does not retain the whole of the input file in memory, doesn't rely on any external tools for pre/post processing, and will produce the output lines in exactly the same order they appeared in the input.
Upvotes: 2
Reputation: 67567
awk
to the rescue!
Need to make sure all records are unique first
$ sort file | uniq |
awk '{c[$2]++; a[$2]=a[$2]?a[$2]RS$0:$0}
END{for(k in a) if(c[k]>1) print a[k]}'
a b x
a b y
a b z
Explanation: keep the counter of second field occurrences and aggregate the records. At the end print the records for which the counter is greater than one.
Upvotes: 1