Find repeat in one column then subtract value in another column

Question

My input file columns are:

a   Otu1    w   4
b   Otu1    x   1
c   Otu2    y   12424
d   Otu3    z   1756

I want to search for each repetition of second column, subtract their values in fourth column. My desired output would be:

a    Otu1   w   3
c   Otu2    y   12424
d   Otu3    z   1756

I have tried the following awk script in a small file with two column

a    3
a    1
b    4

awk '$1 in a{print $1, a[$1]-$2} {a[$1]=$2}' small_input_file

Which gives me the subtracting value only

a    2

How can I modify this script for my input file with four columns?

Thanks.

karakfa · Accepted Answer

a double scan algorithm won't care how many records are there or whether they are consecutive

$ awk 'NR==FNR  {a[$2]=$2 in a?a[$2]-$4:$4; next} 
       !b[$2]++ {print $1,$2,$3,a[$2]}' file{,}

a Otu1 w 3
c Otu2 y 12424
d Otu3 z 1756

Answers (2)