Bob
Bob

Reputation: 342

Find repeat in one column then subtract value in another column

My input file columns are:

a   Otu1    w   4
b   Otu1    x   1
c   Otu2    y   12424
d   Otu3    z   1756

I want to search for each repetition of second column, subtract their values in fourth column. My desired output would be:

a    Otu1   w   3
c   Otu2    y   12424
d   Otu3    z   1756

I have tried the following awk script in a small file with two column

a    3
a    1
b    4

awk '$1 in a{print $1, a[$1]-$2} {a[$1]=$2}' small_input_file

Which gives me the subtracting value only

a    2

How can I modify this script for my input file with four columns?

Thanks.

Upvotes: 1

Views: 36

Answers (2)

James Brown
James Brown

Reputation: 37464

Here is a single pass that outputs in awk default order:

$ awk '{
    if($2 in a)                  # current $2 met before
        b[$2]-=$4                # subtract $4
    else {                       # first time meet current $2
        a[$2]=$0                 # store record to a var
        b[$2]=$4                 # and $4 to another, key with $2
    }
}
END {                            # after processing
    for(i in a) {                # iterate all stored records
        sub(/[^ ]+$/,b[i],a[i])  # replace the last space separated string with the count
        print a[i]               # output
    }
}' file

Output order appears random:

d   Otu3    z   1756
a   Otu1    w   3
c   Otu2    y   12424

Upvotes: 1

karakfa
karakfa

Reputation: 67567

a double scan algorithm won't care how many records are there or whether they are consecutive

$ awk 'NR==FNR  {a[$2]=$2 in a?a[$2]-$4:$4; next} 
       !b[$2]++ {print $1,$2,$3,a[$2]}' file{,}

a Otu1 w 3
c Otu2 y 12424
d Otu3 z 1756

Upvotes: 2

Related Questions