Reputation: 342
My input file columns are:
a Otu1 w 4
b Otu1 x 1
c Otu2 y 12424
d Otu3 z 1756
I want to search for each repetition of second column, subtract their values in fourth column. My desired output would be:
a Otu1 w 3
c Otu2 y 12424
d Otu3 z 1756
I have tried the following awk script in a small file with two column
a 3
a 1
b 4
awk '$1 in a{print $1, a[$1]-$2} {a[$1]=$2}' small_input_file
Which gives me the subtracting value only
a 2
How can I modify this script for my input file with four columns?
Thanks.
Upvotes: 1
Views: 36
Reputation: 37464
Here is a single pass that outputs in awk default order:
$ awk '{
if($2 in a) # current $2 met before
b[$2]-=$4 # subtract $4
else { # first time meet current $2
a[$2]=$0 # store record to a var
b[$2]=$4 # and $4 to another, key with $2
}
}
END { # after processing
for(i in a) { # iterate all stored records
sub(/[^ ]+$/,b[i],a[i]) # replace the last space separated string with the count
print a[i] # output
}
}' file
Output order appears random:
d Otu3 z 1756
a Otu1 w 3
c Otu2 y 12424
Upvotes: 1
Reputation: 67567
a double scan algorithm won't care how many records are there or whether they are consecutive
$ awk 'NR==FNR {a[$2]=$2 in a?a[$2]-$4:$4; next}
!b[$2]++ {print $1,$2,$3,a[$2]}' file{,}
a Otu1 w 3
c Otu2 y 12424
d Otu3 z 1756
Upvotes: 2