Reputation: 35
This is my data structure:
First A 1385
First B 8364
First C 9734
First C 9625
Second A 3566
Second B 9625
Second B 0238
I what to remove duplicate line entries (information in column 1 and 2) and leave first occurrence of it.
I want to remove: First C 9625
and Second B 0238
as they are second occurrences of First C
and Second B
, for the result like this:
First A 1385
First B 8364
First C 9734
Second A 3566
Second B 9625
What have I tried:
awk '{print $1"\t"$2}' FILE |
sort -u |
while read LINE; do
echo $LINE |
tr ' ' '\t' |
grep -m1 -F -f - FILE
done
I am just learning bash coding and my solution is very clumsy. I believe that it is possible to do what I want in one bash command.
Upvotes: 2
Views: 84
Reputation: 203522
$ awk '!seen[$1,$2]++' file
First A 1385
First B 8364
First C 9734
Second A 3566
Second B 9625
Here's why you need the ,
between the fields:
$ cat file
ab c
a bc
$
$ awk '!seen[$1,$2]++' file
ab c
a bc
$ awk '!seen[$1$2]++' file
ab c
Upvotes: 3