Reputation: 3517
I want to get the average of a certain number of rows, in this case this number is dictated by the second column
-1 1 22.776109913596883 0.19607208141710716
-1 1 4.2985901827923954 1.0388892840309705
-1 1 4.642271812306717 0.96197712195674756
-1 2 2.8032298255711794 1.5930763994471333
-1 2 2.9358628368936479 1.5211062387604053
-1 2 4.9987168801017106 0.8933811184867273
1 4 2.6211673161014915 1.7037291934441456
1 4 4.483831056393683 0.99596956735821618
1 4 9.7189442154485732 0.4594901646050486
The expected output would be
-1 1 0.732313
-1 2 1.33585
1 4 1.05306
I have done
awk '{sum+=$4} (NR%3)==0 {print $2,$3,sum/3;sum=0;}' test
which works, but I would like to (somehow) generalize (NR%3)==0
in a way that awk realizes that the value of the second column has changed and therefore means that it's a new average what it needs to calculate. For instance, the first three rows have 1
as value in the second column, so once 1
changes to 2
then means that it's a new average what it needs to be calculated.
Does this make sense?
Upvotes: 0
Views: 528
Reputation: 4112
you can also use try this;
awk '{array[$1" "$2]+=$4} END { for (i in array) {print i" " array[i]/length(array)}}' test | sort -n
Test;
$ awk '{array[$1" "$2]+=$4} END { for (i in array) {print i" " array[i]/length(array)}}' test | sort -n
-1 1 0.732313
-1 2 1.33585
1 4 1.05306
Upvotes: 0
Reputation: 7517
Try something like:
awk '{sum[$2] += $4; count[$2] += 1; }
END { for (k in sum) { print k " " sum[k]/count[k]; } }'
Not tested but that is the idea...
With this method, the whold computation is printed at the end; it may be not what you want if the input is some infinite stream, but according to your example I think it should be fine.
If you want to keep the first column also, you can perfectly do it with the same system.
Upvotes: 1