Reputation: 263

Extract lines having same second column but different third column

I have a file having strings in 3 columns as below.

a b x
a b y
a b z
a c x
a d y

I want to extract all the lines having same second column but different third column. The output I am expecting for the above example is

a b x
a b y
a b z

I tried uniq -f2 and sort -u -k2, But it isn't working as I expect. Any suggestions please.

Upvotes: 1

Answers (2)

Ed Morton

Reputation: 204721

awk '
    seen[$2]++ {
        if (!seen[$2,$3]++) {
            printf "%s%s\n", first[$2], $0
        }
        delete first[$2]
        next
    }
    { first[$2] = $0 ORS }
' file
a b x
a b y
a b z

Note that the above will work in any awk, for any values in your input file, does not retain the whole of the input file in memory, doesn't rely on any external tools for pre/post processing, and will produce the output lines in exactly the same order they appeared in the input.

Upvotes: 2

karakfa

Reputation: 67567

awk to the rescue!

Need to make sure all records are unique first

$ sort file | uniq | 
  awk '{c[$2]++; a[$2]=a[$2]?a[$2]RS$0:$0}
    END{for(k in a) if(c[k]>1) print a[k]}'

a b x
a b y
a b z

Explanation: keep the counter of second field occurrences and aggregate the records. At the end print the records for which the counter is greater than one.

Upvotes: 1

Extract lines having same second column but different third column

Answers (2)

Related Questions