user1308144
user1308144

Reputation: 475

obtain averages of field 2 after grouping by field 1 with awk

I have a file with two fields containing numbers that I have sorted numerically based on field 1. The numbers in field 1 range from 1 to 200000 and the numbers in field 2 between 0 and 1. I want to obtain averages for both field 1 and field 2 in batches (based on rows).

Here is example input output when specifying batches of 4 rows:

1 0.12
1 0.34
2 0.45
2 0.40
50 0.60
301 0.12
899 0.13
1003 0.14
1300 0.56
1699 0.43
2100 0.25
2500 0.56

The output would be:

1.5 0.327
563.25  0.247
1899.75 0.45

Upvotes: 1

Views: 39

Answers (1)

janos
janos

Reputation: 124646

Here you go:

awk -v n=4 '{s1 += $1; s2 += $2; if (++i % n == 0) { print s1/n, s2/n; s1=s2=0; } }'

Explanation:

  • Initialize n=4, the size of the batches
  • Collect the sums: sum of 1st column in s1, the 2nd in s2
  • Increment counter i by 1 (default initial value is 0, no need to set it)
  • If i is divisible by n with no remainder, then we print the averages, and reset the sum variables

Upvotes: 4

Related Questions