leonard vertighel
leonard vertighel

Reputation: 1068

Awk average of n data in each column

"Using awk to bin values in a list of numbers" provide a solution to average each set of 3 points in a column using awk.

How is it possible to extend it to an indefinite number of columns mantaining the format? For example:

2457135.564106 13.249116 13.140903 0.003615 0.003440
2457135.564604 13.250833 13.139971 0.003619 0.003438
2457135.565067 13.247932 13.135975 0.003614 0.003432
2457135.565576 13.256441 13.146996 0.003628 0.003449
2457135.566039 13.266003 13.159108 0.003644 0.003469
2457135.566514 13.271724 13.163555 0.003654 0.003476
2457135.567011 13.276248 13.166179 0.003661 0.003480
2457135.567474 13.274198 13.165396 0.003658 0.003479
2457135.567983 13.267855 13.156620 0.003647 0.003465
2457135.568446 13.263761 13.152515 0.003640 0.003458

averaging values every 5 lines, should output something like

2457135.564916  13.253240   13.143976   0.003622    0.003444
2457135.567324  13.270918   13.161303   0.003652    0.003472

where the first result is the average of the first 1-5 lines, and the second result is the average of the 6-10 lines.

Upvotes: 2

Views: 4245

Answers (1)

Jonathan Leffler
Jonathan Leffler

Reputation: 753475

The accepted answer to Using awk to bin values in a list of numbers is:

awk '{sum+=$1} NR%3==0 {print sum/3; sum=0}' inFile

The obvious extension to average all the columns is:

awk 'BEGIN { N = 3 }
     { for (i = 1; i <= NF; i++) sum[i] += $i }
     NR % N == 0 { for (i = 1; i <= NF; i++)
                   {
                       printf("%.6f%s", sum[i]/N, (i == NF) ? "\n" : " ")
                       sum[i] = 0
                   }
                 }' inFile

The extra flexibility here is that if you want to group blocks of 5 rows, you simply change one occurrence of 3 into 5. This ignores blocks of up to N-1 rows at the end of the file. If you want to, you can add an END block that prints a suitable average if NR % N != 0.

For the sample input data, the output I got from the script above was:

2457135.564592 13.249294 13.138950 0.003616 0.003437
2457135.566043 13.264723 13.156553 0.003642 0.003465
2457135.567489 13.272767 13.162732 0.003655 0.003475

You can make the code much more complex if you want to analyze what the output formats should be. I've simply used %.6f to ensure 6 decimal places.

If you want N to be a command-line parameter, you can use the -v option to relay the variable setting to awk:

awk -v N="${variable:-3}" \
    '{ for (i = 1; i <= NF; i++) sum[i] += $i }
     NR % N == 0 { for (i = 1; i <= NF; i++)
                   {
                       printf("%.6f%s", sum[i]/N, (i == NF) ? "\n" : " ")
                       sum[i] = 0
                   }
                 }' inFile

When invoked with $variable set to 5, the output generated from the sample data is:

2457135.565078 13.254065 13.144591 0.003624 0.003446
2457135.567486 13.270757 13.160853 0.003652 0.003472

Upvotes: 5

Related Questions