Average of a given number of rows

Question

I want to get the average of a certain number of rows, in this case this number is dictated by the second column

-1 1 22.776109913596883 0.19607208141710716
-1 1 4.2985901827923954 1.0388892840309705
-1 1 4.642271812306717 0.96197712195674756
-1 2 2.8032298255711794 1.5930763994471333
-1 2 2.9358628368936479 1.5211062387604053
-1 2 4.9987168801017106 0.8933811184867273
 1 4 2.6211673161014915 1.7037291934441456
 1 4 4.483831056393683 0.99596956735821618
 1 4 9.7189442154485732 0.4594901646050486

The expected output would be

-1 1 0.732313
-1 2 1.33585
 1 4 1.05306

I have done

awk '{sum+=$4} (NR%3)==0 {print $2,$3,sum/3;sum=0;}' test

which works, but I would like to (somehow) generalize (NR%3)==0 in a way that awk realizes that the value of the second column has changed and therefore means that it's a new average what it needs to calculate. For instance, the first three rows have 1 as value in the second column, so once 1 changes to 2 then means that it's a new average what it needs to be calculated.

Does this make sense?

Thomas Baruchel · Accepted Answer

Try something like:

awk '{sum[$2] += $4; count[$2] += 1; }
     END { for (k in sum) { print k " " sum[k]/count[k]; } }'

Not tested but that is the idea...

With this method, the whold computation is printed at the end; it may be not what you want if the input is some infinite stream, but according to your example I think it should be fine.

If you want to keep the first column also, you can perfectly do it with the same system.

Average of a given number of rows

Answers (2)

Related Questions