Reputation: 10861
I have a file with lines like these (columns are tab seperated)
2 1.414455 3.70898
2 2.414455 3.80898
2 3.414455 3.90898
2 1.414455 3.90898
4 4.414455 7.23898
4 3.414455 6.23898
4 5.414455 8.23898
i.e. there are consecutive lines where the first column is an integer, and rest two columns are floats.
I want to reduce them as below
2 2.164455 3.75898
4 4.414455 7.23898
where I keep the first columns, and take the averages of the second and third columns for all elements with same first columns. The number of consecutive lines with same first elements might be different, but they will always be consecutive.
I can do this in perl, but was wondering if there is a simpler bash / sed / awk mix that can do the same for me?
Upvotes: 1
Views: 132
Reputation: 16974
Using awk:
awk '{a[$1]+=$2;b[$1]+=$3;c[$1]++;}END{for(i in c)print i, a[i]/c[i],b[i]/c[i];}' file
2 2.16445 3.83398
4 4.41446 7.23898
Using 3 different arrays: a and b to keep the sum of 2nd and 3rd columns, c to keep the count of elements. At the end, calculating the average and printing it.
Upvotes: 3