Kay
Kay

Reputation: 2067

How to delete the rows whose column 2 and column 3 matches with some previous using awk?

I have a file with 4 columns:

ifile.txt
3  5  2  2
1  4  2  1
4  5  7  2 
5  5  7  1 
0  0  1  1
3  5  7  3
5  4  2  2

I would like to delete the rows whose column 2 & 3 values are same with some previous. for instance, row 2 & 7 have same values in column 2 & 3. Similarly row 3 & 4 & 6 has same values in column 2 & 3. So I want to keep the 2rd row and delete 7th row. Similarly keep 3rd row and delete 4th and 6th row. my output is:

ofile.txt
3  5  2  2
1  4  2  1
4  5  7  2
0  0  1  1

I tried with this command

awk '{a[NR]=$2""$3} a[NR]!=a[NR-1]{print}' ifile.txt > ofile.txt

But it is not giving my desire output.

Upvotes: 4

Views: 125

Answers (3)

anubhava
anubhava

Reputation: 784968

Another shorter awk:

awk '!seen[$2,$3]++' file

3  5  2  2
1  4  2  1
4  5  7  2
0  0  1  1

This awk command uses composite key $2,$3 and stores them in array seen. Value of which is incremented to 1 when a composite key is populated first time.

Upvotes: 4

John1024
John1024

Reputation: 113814

$ awk '!(($2,$3) in a); {a[$2,$3]}' ifile
3  5  2  2
1  4  2  1
4  5  7  2
0  0  1  1

How it works

awk reads the input file one line at a time. Each input line is divided into fields. In this case, the important fields are the second, denoted $2, and the third, denoted $3.

  • !(($2,$3) in a)

    This condition is true if $2,$3 is not a key in associative array a. Since no action is specified, when this condition is true, the default action is performed which is to print the line.

    In more detail, ($2,$3) in a is true when $2,$3 is a key of a. We, however, want the condition to be true in the opposite. Consequently, we apply awk's negation operator, !, to it.

  • a[$2,$3]

    This adds $2,$3 as a key of a.

Upvotes: 4

Markus
Markus

Reputation: 3317

Use a multidimensional array where column 2 and 3 are the indices. You can then test with in whether you already have seen the combination.

See https://www.gnu.org/software/gawk/manual/html_node/Multidimensional.html#Multidimensional for details.

Upvotes: 1

Related Questions