Vijay
Vijay

Reputation: 985

calculating average using awk from multiple files

I have 500 files with name fort.1, fort.2 ... fort.500. Each file contains 800 data as below:

1 0.485
2 0.028
3 0.100
4 0.979
5 0.338
6 0.891
7 0.415
8 0.368
9 0.245
10 0.489

I want to get the average of each line of second column from every file. In other words, get average of second column first line from all files and store in "output.file". Then get average of second column of second line and store in the same "output.file". I tried with paste command but fail to get what I want. IS there any way to do in AWK?

Appreciate any help. Thanks

Upvotes: 8

Views: 11286

Answers (4)

Steve
Steve

Reputation: 54392

Here's a quick way using paste and awk:

paste fort.* | awk '{ for(i=2;i<=NF;i+=2) array[$1]+=$i; if (i = NF) print $1, array[$1]/NF*2 }' > output.file

Like some of the other answers; here's another way but this one uses sort to get numerically sorted output:

awk '{ sum[$1]+=$2; cnt[$1]++ } END { for (i in sum) print i, sum[i]/cnt[i] | "sort -n" }' fort.*

Upvotes: 5

Dwight Holman
Dwight Holman

Reputation: 1610

My understanding: each file is a set of measurements at a particular location. You want to aggregate the measurements across all locations, averaging the value the same row in each file into a new file.

Assuming the first column can be treated as an ID for the row (and there are 800 measurements in a file):

cat fort.* | awk '
BEGIN { 
    for (i = 1; i <= 800; i++)
        total[i] = 0
}

{ total[$1] += $2 } 

END {
    for (i = 1; i <= 800; i++)
        print i, total[i]/500
}
'

First, we initialize an array to store the sum for a row across all files.

Then, we loop through the concatenated files. We use the first column as a key for the row, and we sum into the array.

Finally, we loop over the array and print the average value by row across all files.

Upvotes: 1

Guru
Guru

Reputation: 16974

awk without any assumption on the 1st column:

awk '{a[FNR]+=$2;b[FNR]++;}END{for(i=1;i<=FNR;i++)print i,a[i]/b[i];}' fort.*

Upvotes: 8

lc.
lc.

Reputation: 116458

Assuming the first column is an ID:

cat fort.* | awk '{sum[$1] += $2; counts[$1]++;} END {for (i in sum) print i, sum[i]/counts[i];}' 

Upvotes: 3

Related Questions