Gregor.Mas
Gregor.Mas

Reputation: 35

Leave only first duplicate entry in bash

This is my data structure:

First   A   1385
First   B   8364
First   C   9734
First   C   9625
Second  A   3566
Second  B   9625
Second  B   0238

I what to remove duplicate line entries (information in column 1 and 2) and leave first occurrence of it.
I want to remove: First C 9625 and Second B 0238 as they are second occurrences of First C and Second B, for the result like this:

First   A   1385
First   B   8364
First   C   9734
Second  A   3566
Second  B   9625

What have I tried:

awk '{print $1"\t"$2}' FILE  | 
   sort -u | 
   while read LINE; do 
      echo $LINE | 
      tr ' ' '\t' | 
      grep -m1 -F -f - FILE
   done

I am just learning bash coding and my solution is very clumsy. I believe that it is possible to do what I want in one bash command.

Upvotes: 2

Views: 84

Answers (1)

Ed Morton
Ed Morton

Reputation: 203522

$ awk '!seen[$1,$2]++' file
First   A   1385
First   B   8364
First   C   9734
Second  A   3566
Second  B   9625

Here's why you need the , between the fields:

$ cat file
ab c
a  bc
$
$ awk '!seen[$1,$2]++' file
ab c
a  bc
$ awk '!seen[$1$2]++' file
ab c

Upvotes: 3

Related Questions