user3389597
user3389597

Reputation: 471

AWK to average over columns from multiple files

I have similar multiple files e.g. c1.txt, c2.txt.... etc and I want to average each line of the 7th column of all the files and write the output in another file. Each file has 45120 rows or lines. To calculate the sum of 7th column I write:

awk '{a[FNR]+=$7;b[FNR]++;}END{for(i=1;i<=FNR;i++)print a[i]/b[i];}' c* > ave_result.txt

It then prints out half of the average of each line of the 7th column and it prints only unto line 264. I checked the output of line 264 and it is not printing even the half of the average of those files.

How should I modify the awk command to calculate the correct average of each lines in the 7th column ? Thank you. For example, few lines of the 1st file is

1 1 1 1 1 1 2.559346e-08 2.080054e-10
1 1 1 1 1 2 1.398551e-09 2.709745e-09
1 1 1 1 1 3 -7.939651e-10 -1.560374e-09

and similar in the 2nd file :

2 1 1 1 1 1 2.579924e-08 2.756949e-09
2 1 1 1 1 2 -1.333798e-10 1.700513e-09
2 1 1 1 1 3 2.334223e-09 -3.592740e-09

and SO ON. I would like to calculate average of the 7th column of all the files I have. SO the expected output is

2.579924e-08
6.3259e-10
6.3259e-10

Now how to edit the awk command if I have 200 such files and each file has 45120 rows?

Upvotes: 1

Views: 1766

Answers (1)

Kent
Kent

Reputation: 195039

If you want to get the average of col7 in each file:

you set a[] and b[] with one file, but you didn't clear them when started processing new file. so the result won't be correct. In fact, array is not needed for this problem. you could try this (I didn't test):

awk 'FNR==1{if(s!=0)print s/c; s=0;c=0}{s+=$7;c++}END{print s/c}' c* > result.txt

If you want to get average of col7 from all files:

awk '{s+=$7}END{print s/NR}' c* > result.txt

EDIT

as @PM77-1 commented, I might not understand your requirement right. If you want to have 45120 output lines, it is like

sum $7 of all line 1 from all files, and get average, output line 1
sum $7 of all line 2 from all files, and get average, output line 2
...
sum $7 of all line 45120 from all files, and get average, output line 45120

You don't need the b[] actually. you can either use a counter to calculate the count of files or use gawk's ARGC

awk '{a[FNR]+=$7}END{for(i=1;i<=FNR;i++)print a[i]/(ARGC-1);}' c* >...

Upvotes: 3

Related Questions