Moeb
Moeb

Reputation: 10861

How to reduce a set of lines to take the average?

I have a file with lines like these (columns are tab seperated)

2    1.414455    3.70898
2    2.414455    3.80898
2    3.414455    3.90898
2    1.414455    3.90898
4    4.414455    7.23898
4    3.414455    6.23898
4    5.414455    8.23898

i.e. there are consecutive lines where the first column is an integer, and rest two columns are floats.

I want to reduce them as below

2    2.164455    3.75898
4    4.414455    7.23898

where I keep the first columns, and take the averages of the second and third columns for all elements with same first columns. The number of consecutive lines with same first elements might be different, but they will always be consecutive.

I can do this in perl, but was wondering if there is a simpler bash / sed / awk mix that can do the same for me?

Upvotes: 1

Views: 132

Answers (1)

Guru
Guru

Reputation: 16974

Using awk:

awk '{a[$1]+=$2;b[$1]+=$3;c[$1]++;}END{for(i in c)print i, a[i]/c[i],b[i]/c[i];}' file
2 2.16445 3.83398
4 4.41446 7.23898

Using 3 different arrays: a and b to keep the sum of 2nd and 3rd columns, c to keep the count of elements. At the end, calculating the average and printing it.

Upvotes: 3

Related Questions