How to calculate average of subsets of columns in csv?

Question

I have a very large CSV file that looks something as follows:

#       col1    col2    col3
1       1       7       9
2       2       8       10
3       3       9       11
4       4       10      12
5       5       11      13
6       6       12      14

For all columns, I would like to calculate the average of each consecutive two fields, then offset to the next two. For instance, in col1 average of 1 and 2 is the first cell of the resulting column, the average of 3 and 4 is the second cell of the resulting column. So, the new column size is half the original col1.

The output of the script should look as follows for the provided sample file above:

#       col1    col2    col3
1       1.5     7.5     9.5
2       3.5     9.5     11.5
3       5.5     11.5    13.5

This problem seems like a good [one] to be solved with AWK, but I'm still new to using AWK.

Any pointers are appreciated.

mbadawi23 · Accepted Answer

I took the liberty to generalize Jonathan Leffler's answer to cover the Nth case for size of the average window and offset.

I wrote an awk script (I called it avewithoffset) as follows:

#!bin/awk
BEGIN{
    FS=OFS="	";
    n=5; }
NR==1 { print; next;}
(NR-1)%n!=0 { for (i = 2; i <= NF; i++) old[i] += $i; }
(NR-1)%n==0 { for (i = 2; i <= NF; i++)
              { $i = ($i + old[i])/n; old[i] = 0; }
              $1 = int( (NR-1)/n );
              print; }

Notice that n=5.

I fed the following file to it:

#   col1    col2    col3
1   1       16      31
2   2       17      32
3   3       18      33
4   4       19      34
5   5       20      35
6   6       21      36
7   7       22      37
8   8       23      38
9   9       24      39
10  10      25      40
11  11      26      41
12  12      27      42
13  13      28      43
14  14      29      44
15  15      30      45

And the resulting file looks like:

#   col1    col2    col3
1   3       18      33
2   8       23      38
3   13      28      43

How to calculate average of subsets of columns in csv?

Answers (2)

Generalizing for groups of N rows

Related Questions