Reputation: 1328
I have input file as follows
100A 2000
100B 150
100C 800
100A 1000
100B 100
100C 300
I want to subtract values in column 2 for each uniq value in column 1 so the out put should look like
100A 1000
100B 50
100C 500
I have tried
awk '{if(!a[$1])a[$1]=$2; else a[$1]=$2-a[$1]}END{ for(i in a)print i" " a[i]}' file
but the out put is :
100A 0
100B 0
100C 0
please advise
Upvotes: 0
Views: 398
Reputation: 46826
So many (slight) variations on the same theme.
awk '
!($1 in a) {a[$1]=$2; next}
{a[$1]-=$2}
END {for (i in a) printf "%s %d\n",i,a[i]}
' input.txt
Stack it up as a one-liner if you like.
Remember that awk structure consists of multiple condition { statement }
pairs, so you can sometimes express your requirements more elegantly than using an if..else
. (Not saying that this is the case here - this is a simple enough awk script that it probably doesn't matter, unless you're a purist. :] )
Also, beware of testing for values the way you've done in the condition in your if
in the question. Note that a[$1]
both tests whether the value at that array index is non-zero and causes the index to exist with a null value if it didn't previously exist. If you want to check for index existence, use $1 in a
.
Update based on a comment on your question...
If you want to subtract the last from the first entry, ignoring the ones in between, then you need to keep a record of both your firsts and your lasts. Something like this might suffice.
awk '
!($1 in a){a[$1]=$2;next}
{b[$1]=$2}
END {for(i in b)if(i in a)print i,a[i]-b[i]}
' input.txt
Note that as Ed mentioned, this produces output in random order. If you want the output ordered, you'll need an additional array to track of the order. For example, this will use order that items are first seen:
awk '
!($1 in a) {
a[$1]=$2;
o[++n]=$1;
next
}
{
b[$1]=$2
}
END {
for (n=1;n<=length(o);n++)
print o[n],a[o[n]]-b[o[n]]
}
' i
Note that the length()
function being used to determine the number of elements in an array is not universal amongst dialects of awk, but it does work in both gawk and one-true-awk (used in FreeBSD and others).
Upvotes: 2
Reputation: 203229
Given the sample input you provided, all you need is:
$ awk '$1 in a{print $1, a[$1]-$2} {a[$1]=$2}' file
100A 1000
100B 50
100C 500
If that's not all you need then provide more truly representative sample input/output that includes the cases where that's not good enough.
Upvotes: 1
Reputation: 37394
In awk. Using conditional operator for value placing/subtraction to keep it tight:
$ awk '{ a[$1]+=($1 in a?-$2:$2) } END{ for(i in a)print i, a[i] }' file
100A 1000
100B 50
100C 500
Explained:
{
a[$1]+=($1 in a?-$2:$2) # if $1 in a already, subtract from it
# otherwise add value to it
}
END {
for(i in a) # go thru all a
print i, a[i] # and print keys and values
}
Upvotes: 1
Reputation: 14949
You can use this awk
:
awk 'a[$1]{a[$1]=a[$1]-$2; next} {a[$1]=$2} END{for(v in a){print v, a[v]}}' file
Upvotes: 0
Reputation: 195039
This awk one-liner does the job:
awk '{if($1 in a)a[$1]=a[$1]-$2;else a[$1]=$2}
END{for(x in a) print x, a[x]}' file
Upvotes: 1