Reputation: 317
I have a big text file like this example:
example
:
chr11 314980 314981 63 IFITM1 -131
chr11 315025 315026 54 IFITM1 -86
chr5 315085 315086 118 AHRR -53011
chr16 316087 316088 56 ITFG3 -86
chr16 316088 316089 90 ITFG3 -131
chr11 319672 319673 213 IFITM3 -131
chr11 319674 319675 514 IFITM3 -164
I want to group the rows based on the 6th column and sum the values from the 4th column for every group. the new file would have 2 columns. 1st column would be the group and the 2nd column would be sum (sum of values from column 4 from similar groups). the expected output would look like this:
expected output
:
-131 366
-86 110
-53011 118
-164 514
I am trying to do that in awk
using the following code.
sort myfile.txt | awk -F'\t' '{ sub(/..$/,"**",$6) }1' OFS='\t' | awk '{print $1 "\t" $2}' > outfile.txt
but actually it returns an empty file. do you know how to fix it?
Upvotes: 0
Views: 115
Reputation: 50760
If you're ok with sorted output, you don't need arrays:
sort -k6n file |
awk -F'\t' '
grp != $6 {
grp = $6
printf "%s%s%s%s", sum, sep, grp, FS
sum = 0
sep = ORS
} { sum += $4 } END { print sum }'
Upvotes: 1
Reputation: 14955
Use an associative array:
awk '{a[$NF]+=$4}END{for (i in a){print i, a[i]}}' file
Upvotes: 4
Reputation: 246807
I have no idea what you are thinking with your code: why you are replacing the last 2 chars on the line with asterisks? why aren't you doing any addition anywhere? why do you sort (by column 1) first?
awk -F'\t' '
{sum[$6] += $4}
END {for (key in sum) {print key, sum[key]}}
' file | column -t
Upvotes: 4