user3741035
user3741035

Reputation: 2545

Sum values for similar lines using awk

From the example below I want to sum the scores for the rows where Targets and miRNA are similar: Please see below.

 Target       miRNA                 Score
 NM_198900    hsa-miR-423-5p       -0.244
 NM_198900    hsa-miR-423-5p       -0.6112
 NM_1989230   hsa-miR-413-5p       -0.644
 NM_1989230   hsa-miR-413-5p       -0.912

Output:

NM_198900      hsa-miR-423-5p       -0.8552
NM_1989230     hsa-miR-413-5p       -1.556

Upvotes: 2

Views: 72

Answers (1)

Mark Setchell
Mark Setchell

Reputation: 207650

Like this:

awk '{x[$1 " " $2]+=$3} END{for (r in x)print r,x[r]}' file

As it sees each line, it adds the third field ($3) into an array x[] as indexed by joining fields 1 and 2 with a space between them. At the end, it prints all elements of x[].

Following @jaypal's suggestion, you may prefer this which retains your header line (NR==1) and uses TABs as the Output Field Separator

awk 'NR==1{OFS="\t";print;next} {x[$1 OFS $2]+=$3} END{for (r in x)print r,x[r]}' file

Upvotes: 4

Related Questions