Reputation: 107
I was not sure how to formulate the question but here it is.
I have a long file with 12/24/36/48... lines.
The file looks like this.
0 413
1 388
2 272
3 289
4 42
5 45
6 423
7 522
8 949
9 984
10 371
11 990
0 412
1 370
2 254
3 255
4 42
5 58
6 391
7 546
8 938
9 985
10 381
11 992
Now what I wanna do is to average all lines beginning with 0... so the for example 413+412/2 for the 0 line, then every line beginning with 1 and so on ... till 11. so the output would have only 12 lines with averages of every nth line.
I'm really struggling. I know how to awk every line beginning with a number but gets a little but confusing there.
Upvotes: 2
Views: 1133
Reputation: 84531
Bash provides an easy solution (updated to keep individual count of each index 0 .. 11). An additional update was provided setting the integer attribute for the arrays allowing a more succinct increment of values within arithmetic operators:
#!/bin/bash
[ -n "$1" -a -f "$1" ] || { # test filename provided & is readable
printf "\n Error: invalid input. Usage: %s <input_file>\n\n" "${0//*\//}"
exit 1
}
declare -ai cnt # count of how many times 0..11 encountered
declare -ai sum # array holding running total of each 0 .. 11
while read -r idx val || [ -n "$val" ]; do # read each line
((sum[idx]+=val)) # keep sum of each 0 .. 11
((cnt[idx]++)) # keep cnt of each 0 .. 11
done <"$1"
## for each element in the array, compute average and print (using bc for division)
printf "\nThe sum and averages of each line index are:\n\n"
for ((i=0; i<"${#sum[@]}"; i++)); do
printf " %4s %8s / %-3s = %s\n" "$i" "${sum[i]}" "${cnt[i]}" "$(printf "%.3f" $(printf "scale=4;${sum[i]}/${cnt[i]}\n" | bc) )"
done
exit 0
output:
$ bash avgnthln.sh dat/avgln.dat
The sums and averages of each line index are:
0 825 / 2 = 412.500
1 758 / 2 = 379.000
2 526 / 2 = 263.000
3 544 / 2 = 272.000
4 84 / 2 = 42.000
5 103 / 2 = 51.500
6 814 / 2 = 407.000
7 1068 / 2 = 534.000
8 1887 / 2 = 943.500
9 1969 / 2 = 984.500
10 752 / 2 = 376.000
11 1982 / 2 = 991.000
Upvotes: 1
Reputation: 80921
awk '{sum[$1]=sum[$1] + $2; nr[$1]++} END {for (a in sum) {print a, sum[a]/nr[a]}}' file
Keep a running sum of the second field indexed by the first field. Also count how many of each first field you see. Then loop over all the seen fields and print out the field and the average.
If you want the output in order you can pipe to sort
or use a numeric loop in the END
block (if you know the minimum/maximum values ahead of time). You could also keep the max value in the main action block and use that but this was simpler.
Upvotes: 3
Reputation: 184995
awk '$1 == 0{c++;r+=$2}END{print r/c}' file
412.5
Feel free to improve it for other lines...
Upvotes: 3