Pavlos Maragkos
Pavlos Maragkos

Reputation: 91

subtract values in column 1 if column 2 is the same

I have a file on the following format:

0.019059000     15150000000
0.037088000     15150000000
0.035007000     15150000001
0.047622000     15150000001
0.053359000     15150000002
0.060405000     15150000002
0.068598000     15150000003
0.081587000     15150000003

I would like to subtract column 1 when column 2 is the same. For example for the input file, i would like to have something like this:

0.018029 15150000000
0.012615 15150000001
0.007046 15150000002
0.012989 15150000003

All the values on the column 2 on the input file go in pairs for example 15150000000 exists only two times, 15150000001 exists only two times etc.

Any help is more than welcome!

Upvotes: 1

Views: 201

Answers (3)

karakfa
karakfa

Reputation: 67507

awk to the rescue! (without error checking.)

$ awk 'p==$2 {print $1-pv,p} {p=$2; pv=$1}' file

0.018029 15150000000
0.012615 15150000001
0.007046 15150000002
0.012989 15150000003

for unsorted but again double records for the same key

$ awk '$2 in a {print $1-a[$2],$2; delete a[$2]; next} {a[$2]=$1}' file

0.018029 15150000000
0.012615 15150000001
0.007046 15150000002
0.012989 15150000003

if the second value not always larger than the first one and you want the absolute difference

$ awk 'function abs(x) {return x<0?-x:x}
       $2 in a {print abs($1-a[$2]),$2; delete a[$2]; next} 
               {a[$2]=$1}' file

Upvotes: 4

James Brown
James Brown

Reputation: 37404

Another in awk, subtracts smaller from bigger:

$ awk '{
    if($2 in a) {                              # if another $2 already met
        print ((s=$1-a[$2])>0?s:-s),$2         # subtract smaller from bigger
        delete a[$2]                           # delete to save memory
    } else 
        a[$2]=$1                               # else store $2
}' <(shuf file)                                # shuf file to demo random order
                                               # replace with just the file name

A sample output (due to shufrandomness):

0.007046 15150000002
0.018029 15150000000
0.012615 15150000001
0.012989 15150000003

Upvotes: 1

glenn jackman
glenn jackman

Reputation: 246807

How about

awk '{a[$2] = $1 - a[$2]} END {for (b in a) print a[b], b}' file

Ah, I see you have values in pairs. Go with karakfa's answer then.

Upvotes: 0

Related Questions