SaKer
SaKer

Reputation: 107

average of every nth lines bash

I was not sure how to formulate the question but here it is.

I have a long file with 12/24/36/48... lines.

The file looks like this.

0 413
1 388
2 272
3 289
4 42
5 45
6 423
7 522
8 949
9 984
10 371
11 990
0 412
1 370
2 254
3 255
4 42
5 58
6 391
7 546
8 938
9 985
10 381
11 992

Now what I wanna do is to average all lines beginning with 0... so the for example 413+412/2 for the 0 line, then every line beginning with 1 and so on ... till 11. so the output would have only 12 lines with averages of every nth line.

I'm really struggling. I know how to awk every line beginning with a number but gets a little but confusing there.

Upvotes: 2

Views: 1133

Answers (3)

David C. Rankin
David C. Rankin

Reputation: 84531

Bash provides an easy solution (updated to keep individual count of each index 0 .. 11). An additional update was provided setting the integer attribute for the arrays allowing a more succinct increment of values within arithmetic operators:

#!/bin/bash

[ -n "$1" -a -f "$1" ] || {     # test filename provided & is readable
    printf "\n Error: invalid input. Usage:  %s <input_file>\n\n" "${0//*\//}"
    exit 1
}

declare -ai cnt      # count of how many times 0..11 encountered
declare -ai sum      # array holding running total of each 0 .. 11

while read -r idx val || [ -n "$val" ]; do      # read each line
    ((sum[idx]+=val))                           # keep sum of each 0 .. 11
    ((cnt[idx]++))                              # keep cnt of each 0 .. 11
done <"$1"

## for each element in the array, compute average and print (using bc for division)
printf "\nThe sum and averages of each line index are:\n\n"
for ((i=0; i<"${#sum[@]}"; i++)); do
    printf "  %4s  %8s / %-3s = %s\n" "$i" "${sum[i]}" "${cnt[i]}" "$(printf "%.3f" $(printf "scale=4;${sum[i]}/${cnt[i]}\n" | bc) )"
done

exit 0

output:

$ bash avgnthln.sh dat/avgln.dat

The sums and averages of each line index are:

     0       825 / 2   = 412.500
     1       758 / 2   = 379.000
     2       526 / 2   = 263.000
     3       544 / 2   = 272.000
     4        84 / 2   = 42.000
     5       103 / 2   = 51.500
     6       814 / 2   = 407.000
     7      1068 / 2   = 534.000
     8      1887 / 2   = 943.500
     9      1969 / 2   = 984.500
    10       752 / 2   = 376.000
    11      1982 / 2   = 991.000

Upvotes: 1

Etan Reisner
Etan Reisner

Reputation: 80921

awk '{sum[$1]=sum[$1] + $2; nr[$1]++} END {for (a in sum) {print a, sum[a]/nr[a]}}' file

Keep a running sum of the second field indexed by the first field. Also count how many of each first field you see. Then loop over all the seen fields and print out the field and the average.

If you want the output in order you can pipe to sort or use a numeric loop in the END block (if you know the minimum/maximum values ahead of time). You could also keep the max value in the main action block and use that but this was simpler.

Upvotes: 3

Gilles Qu&#233;not
Gilles Qu&#233;not

Reputation: 184995

awk '$1 == 0{c++;r+=$2}END{print r/c}' file
412.5

Feel free to improve it for other lines...

Upvotes: 3

Related Questions