Reputation: 768
I have a big data file with many columns. I would like to get the mean value of some of the columns if another column has a specific value.
For example if $19=9.1
then get the mean of $24
, $25
,$27
, $28
, $32
and $35
and write these values in a file like
9.1 (mean$24) (mean$25) ..... (mean$32) (mean$35)
and add two more lines for two other values of $19 column, for example, 11.9 and 13.9, resulting:
9.1 (mean$24) (mean$25) ..... (mean$32) (mean$35)
11.9 (mean$24) (mean$25) ..... (mean$32) (mean$35)
13.9 (mean$24) (mean$25) ..... (mean$32) (mean$35)
I have seen a post "awk average part of a column if lines (specific field) match" which makes the mean of only one column if the first has some value, but I do not know how to extend the solution to my problem.
Upvotes: 1
Views: 48
Reputation: 67467
this should work, if you fill in the blanks...
$ awk 'BEGIN {n=split("1.9 11.9 13.9",a)}
{k=$19; c[k]++; m24[k]+=$24; m25[k]+=$25; ...}
END {for(i=1;i<=n;i++) print k=a[i], m24[k]/c[k], m25[k]/c[k], ...}' file
perhaps handle c[k]=0
condition as well, with something like this:
function mean(sum,count) {return (count==0?"NaN":sum/count)}
Upvotes: 2