Kay
Kay

Reputation: 2077

Moving average using awk in all columns

I have a data as:

2       2
3       3
4       3
2       2
1       1
56      4
3       2
4       1
2       2
4       2
5       5
3       3
5       6
6       4

I would like to print the moving average with each 5 past numbers for all columns.

Desire Output is:

2.4     2.2
13.2    2.6
13.2    2.4
13.2    2
13.2    2
13.8    2.2
3.6     2.4
3.6     2.6
3.8     3.6
4.6     4

Upvotes: 1

Views: 271

Answers (4)

kvantour
kvantour

Reputation: 26481

All presented results are very memory intensive as the load the entire system into memory. While some delete the allocated memory, it is easier to just use a modular index. On top of that you don't really need to constantly recompute the sums (with floats I would argue differently if you have a high precision demand, but with integers it is not needed):

This solution assumes an equal amount of columns and a sliding window of n:

awk -v n=5 '{for(i=1;i<=NF;++i) {s[i] = s[i] - a[FNR%n,i] + $i; a[FNR%n,i]=$i } }
            (FNR >= n)  { for(i=1;i<=NF;++i) printf "%s" (i==NF?ORS:OFS), s[i]/n }' file

Upvotes: 1

David C. Rankin
David C. Rankin

Reputation: 84569

You can do it with three-rules utilizing a "Sliding-Window" for values stored in two arrays a[] and b[]. You simply fill each element using a counter n as the index, and then when n >= 5 output the sum and delete the value at a[n-4] and b[n-4] (optional) and keep going. Your first rule is just that (with the addition of a loop and a sum of the last 5 values in each for the average.

Your second rule simply validates you have 2 fields and fills the a[] and b[] arrays. (you can add tests to ensure both field1 and field2 are numeric values -- that is left to you)

Your third rule is the END rule which computes and outputs the final sum, e.g.

awk '
    n >= 5 {
        suma=sumb=0
        for (i = n-4; i <= n; i++) {
            suma+=a[i]
            sumb+=b[i]
        }
        print suma/5"\t"sumb/5
        delete a[n-4]
        delete b[n-4]
    }
    NF >= 2 {
        a[++n] = $1
        b[n] = $2
    }
    END {
        suma=sumb=0
        for (i = n-4; i <= n; i++) {
            suma+=a[i]
            sumb+=b[i]
        }
        print suma/5"\t"sumb/5
    }
' data

(instead of looping to compute the sums, you can keep running sums and subtract the values you unset from the arrays -- up to you)

Example Use/Output

You can simply use an xterm and change to the directory where your data file is stored (change the name as needed) and select-copy the awk script above and middle-mouse-paste into the xterm. You will receive:

2.4     2.2
13.2    2.6
13.2    2.4
13.2    2
13.2    2
13.8    2.2
3.6     2.4
3.6     2.6
3.8     3.6
4.6     4

Keeping Running Sums

If you did want to keep running sums (suma and sumb) and remove the values at n-4 instead of looping, (which would be slightly more efficient), you could do:

awk '
    n >= 5 {
        print suma/5"\t"sumb/5
        suma -= a[n-4]
        sumb -= b[n-4]
    }
    NF >= 2 {
        a[++n] = $1
        b[n] = $2
        suma += a[n]
        sumb += b[n]
    }
    END {
        print suma/5"\t"sumb/5
    }
' data

The output is the same.

Upvotes: 3

RavinderSingh13
RavinderSingh13

Reputation: 133545

Could you please try following, adding one more way of doing this. Written and tested with shown samples in GNU awk.

awk '
FNR==NR{
  a[FNR]=$1
  b[FNR]=$2
  lines++
  next
}
FNR<=(lines-4){
  ++count
  for(i=count;i<=(4+count);i++){
    sum1+=a[i]
    sum2+=b[i]
  }
  print sum1/5,sum2/5
  sum1=sum2=""
}
' Input_file  Input_file | column -t

Upvotes: 2

anubhava
anubhava

Reputation: 785316

Here is another awk using 2 pass:

awk -v OFS='\t' 'FNR == NR {
   a[FNR] = $1
   b[FNR] = $2
   for (i=FNR-4; FNR>= 5 && i<=FNR; i++) {
      sum1[FNR-4] += a[i]
      sum2[FNR-4] += b[i]
   }
   tr = FNR
   next
}
FNR <= tr-4 {
   printf "%.2f%s%.2f\n", sum1[FNR]/5, OFS, sum2[FNR]/5
}' file file
2.40    2.20
13.20   2.60
13.20   2.40
13.20   2.00
13.20   2.00
13.80   2.20
3.60    2.40
3.60    2.60
3.80    3.60
4.60    4.00

Upvotes: 2

Related Questions