How to get the mean value of different column values when one column match a value

Question

I have a big data file with many columns. I would like to get the mean value of some of the columns if another column has a specific value.

For example if $19=9.1 then get the mean of $24, $25,$27, $28, $32 and $35 and write these values in a file like

9.1 (mean$24) (mean$25) ..... (mean$32) (mean$35)

and add two more lines for two other values of $19 column, for example, 11.9 and 13.9, resulting:

9.1  (mean$24) (mean$25) ..... (mean$32) (mean$35)
11.9 (mean$24) (mean$25) ..... (mean$32) (mean$35)
13.9 (mean$24) (mean$25) ..... (mean$32) (mean$35)

I have seen a post "awk average part of a column if lines (specific field) match" which makes the mean of only one column if the first has some value, but I do not know how to extend the solution to my problem.

karakfa · Accepted Answer

this should work, if you fill in the blanks...

$ awk 'BEGIN {n=split("1.9 11.9 13.9",a)} 
             {k=$19; c[k]++; m24[k]+=$24; m25[k]+=$25; ...}
         END {for(i=1;i<=n;i++) print k=a[i], m24[k]/c[k], m25[k]/c[k], ...}' file

perhaps handle c[k]=0 condition as well, with something like this:

function mean(sum,count) {return (count==0?"NaN":sum/count)}

How to get the mean value of different column values when one column match a value

Answers (1)

Related Questions