Reputation: 2077
I have a data as:
2 2
3 3
4 3
2 2
1 1
56 4
3 2
4 1
2 2
4 2
5 5
3 3
5 6
6 4
I would like to print the moving average with each 5 past numbers for all columns.
Desire Output is:
2.4 2.2
13.2 2.6
13.2 2.4
13.2 2
13.2 2
13.8 2.2
3.6 2.4
3.6 2.6
3.8 3.6
4.6 4
Upvotes: 1
Views: 271
Reputation: 26481
All presented results are very memory intensive as the load the entire system into memory. While some delete the allocated memory, it is easier to just use a modular index. On top of that you don't really need to constantly recompute the sums (with floats I would argue differently if you have a high precision demand, but with integers it is not needed):
This solution assumes an equal amount of columns and a sliding window of n
:
awk -v n=5 '{for(i=1;i<=NF;++i) {s[i] = s[i] - a[FNR%n,i] + $i; a[FNR%n,i]=$i } }
(FNR >= n) { for(i=1;i<=NF;++i) printf "%s" (i==NF?ORS:OFS), s[i]/n }' file
Upvotes: 1
Reputation: 84569
You can do it with three-rules utilizing a "Sliding-Window" for values stored in two arrays a[]
and b[]
. You simply fill each element using a counter n
as the index, and then when n >= 5
output the sum and delete
the value at a[n-4]
and b[n-4]
(optional) and keep going. Your first rule is just that (with the addition of a loop and a sum of the last 5 values in each for the average.
Your second rule simply validates you have 2 fields and fills the a[]
and b[]
arrays. (you can add tests to ensure both field1 and field2 are numeric values -- that is left to you)
Your third rule is the END
rule which computes and outputs the final sum, e.g.
awk '
n >= 5 {
suma=sumb=0
for (i = n-4; i <= n; i++) {
suma+=a[i]
sumb+=b[i]
}
print suma/5"\t"sumb/5
delete a[n-4]
delete b[n-4]
}
NF >= 2 {
a[++n] = $1
b[n] = $2
}
END {
suma=sumb=0
for (i = n-4; i <= n; i++) {
suma+=a[i]
sumb+=b[i]
}
print suma/5"\t"sumb/5
}
' data
(instead of looping to compute the sums, you can keep running sums and subtract the values you unset
from the arrays -- up to you)
Example Use/Output
You can simply use an xterm and change to the directory where your data
file is stored (change the name as needed) and select-copy the awk
script above and middle-mouse-paste into the xterm. You will receive:
2.4 2.2
13.2 2.6
13.2 2.4
13.2 2
13.2 2
13.8 2.2
3.6 2.4
3.6 2.6
3.8 3.6
4.6 4
Keeping Running Sums
If you did want to keep running sums (suma
and sumb
) and remove the values at n-4
instead of looping, (which would be slightly more efficient), you could do:
awk '
n >= 5 {
print suma/5"\t"sumb/5
suma -= a[n-4]
sumb -= b[n-4]
}
NF >= 2 {
a[++n] = $1
b[n] = $2
suma += a[n]
sumb += b[n]
}
END {
print suma/5"\t"sumb/5
}
' data
The output is the same.
Upvotes: 3
Reputation: 133545
Could you please try following, adding one more way of doing this. Written and tested with shown samples in GNU awk
.
awk '
FNR==NR{
a[FNR]=$1
b[FNR]=$2
lines++
next
}
FNR<=(lines-4){
++count
for(i=count;i<=(4+count);i++){
sum1+=a[i]
sum2+=b[i]
}
print sum1/5,sum2/5
sum1=sum2=""
}
' Input_file Input_file | column -t
Upvotes: 2
Reputation: 785316
Here is another awk
using 2 pass:
awk -v OFS='\t' 'FNR == NR {
a[FNR] = $1
b[FNR] = $2
for (i=FNR-4; FNR>= 5 && i<=FNR; i++) {
sum1[FNR-4] += a[i]
sum2[FNR-4] += b[i]
}
tr = FNR
next
}
FNR <= tr-4 {
printf "%.2f%s%.2f\n", sum1[FNR]/5, OFS, sum2[FNR]/5
}' file file
2.40 2.20
13.20 2.60
13.20 2.40
13.20 2.00
13.20 2.00
13.80 2.20
3.60 2.40
3.60 2.60
3.80 3.60
4.60 4.00
Upvotes: 2