Reputation: 1338

subtracting values in one column based on another column

I have input file as follows

I want to subtract values in column 2 for each uniq value in column 1 so the out put should look like

100A 1000
100B 50
100C 500

I have tried

 awk '{if(!a[$1])a[$1]=$2; else a[$1]=$2-a[$1]}END{ for(i in a)print i" " a[i]}' file

but the out put is :

100A 0
100B 0
100C 0

please advise

Upvotes: 0

Answers (5)

ghoti

Reputation: 46896

So many (slight) variations on the same theme.

awk '
  !($1 in a) {a[$1]=$2; next}
  {a[$1]-=$2}
  END {for (i in a) printf "%s %d\n",i,a[i]}
' input.txt

Stack it up as a one-liner if you like.

Remember that awk structure consists of multiple condition { statement } pairs, so you can sometimes express your requirements more elegantly than using an if..else. (Not saying that this is the case here - this is a simple enough awk script that it probably doesn't matter, unless you're a purist. :] )

Also, beware of testing for values the way you've done in the condition in your if in the question. Note that a[$1] both tests whether the value at that array index is non-zero and causes the index to exist with a null value if it didn't previously exist. If you want to check for index existence, use $1 in a.

Update based on a comment on your question...

If you want to subtract the last from the first entry, ignoring the ones in between, then you need to keep a record of both your firsts and your lasts. Something like this might suffice.

awk '
  !($1 in a){a[$1]=$2;next}
  {b[$1]=$2}
  END {for(i in b)if(i in a)print i,a[i]-b[i]}
' input.txt

Note that as Ed mentioned, this produces output in random order. If you want the output ordered, you'll need an additional array to track of the order. For example, this will use order that items are first seen:

awk '
  !($1 in a) {
    a[$1]=$2;
    o[++n]=$1;
    next
  }
  {
    b[$1]=$2
  }
  END {
    for (n=1;n<=length(o);n++)
      print o[n],a[o[n]]-b[o[n]]
  }
' i

Note that the length() function being used to determine the number of elements in an array is not universal amongst dialects of awk, but it does work in both gawk and one-true-awk (used in FreeBSD and others).

Upvotes: 2

Ed Morton

Reputation: 204638

Given the sample input you provided, all you need is:

$ awk '$1 in a{print $1, a[$1]-$2} {a[$1]=$2}' file
100A 1000
100B 50
100C 500

If that's not all you need then provide more truly representative sample input/output that includes the cases where that's not good enough.

Upvotes: 1

James Brown

Reputation: 37464

In awk. Using conditional operator for value placing/subtraction to keep it tight:

$ awk '{ a[$1]+=($1 in a?-$2:$2) } END{ for(i in a)print i, a[i] }' file
100A 1000
100B 50
100C 500

Explained:

{ 
    a[$1]+=($1 in a?-$2:$2)  # if $1 in a already, subtract from it 
                                # otherwise add value to it
} 
END { 
    for(i in a)              # go thru all a
        print i, a[i]          # and print keys and values
}

Upvotes: 1

sat

Reputation: 14979

You can use this awk:

awk 'a[$1]{a[$1]=a[$1]-$2; next} {a[$1]=$2} END{for(v in a){print v, a[v]}}' file

Upvotes: 0

Kent

Reputation: 195269

This awk one-liner does the job:

 awk '{if($1 in a)a[$1]=a[$1]-$2;else a[$1]=$2}
      END{for(x in a) print x, a[x]}' file

Upvotes: 1

subtracting values in one column based on another column

Answers (5)

Related Questions