Reputation: 321
I would like to sum up integers per line. I have a data file where I extract information.
I use this command to extract 7th information in column 10 that has value more than 0.25 . I would like to get the total number of integers having more than 0.25 value. But I'm getting a list of integers that are more than 0.25
awk -F"\t" 'NR>1{split($10,a,":"); count10[a[7]]++} END {for (i in count10) if (i>0.25) print i, count10[i]}' mygene.vcf
sample output that I'm getting now :
0.689 7
0.648 9
0.607 83
0.279 26
what I require:
125
sample data (10th column)
1/1:27:0,27:0,37:0:0,0.741:1.0:0:98:0,59.0
1/0:26:15,11:35,37:0:0.733,0.727:0.423:0:28:56.9,60.0
1/1:55:0,55:0,38:0:0,0.527:1.0:0:183:0,59.6
1/0:49:26,23:36,36:0:0.615,0.739:0.469:0:47:60.0,58.5
Upvotes: 1
Views: 39
Reputation: 10865
You basically already have it. Instead of doing a print for each iteration of the for
loop, accumulate into a sum and print that:
awk -F"\t" 'NR>1 {split($10,a,":");
count10[a[7]]++}
END {for (i in count10)
if (i>0.25)
sum += count10[i];
print sum }' mygene.vcf
Upvotes: 1