Reputation: 35
I have a file with thousands of numbers on column 1 and each sequence of these numbers are associated with a single person. Would someone have any idea on how can I create a shell script to sum column 1 for that specific person, eg:
John is 10+20+30+50 = 110
Output of the script would be: John 110
and so on and so forth..
I have tried with while, for, etc but I can't associate the sum to the person :(
Example of the file:
10 John
20 John
30 John
50 John
10 Paul
10 Paul
20 Paul
20 Paul
20 Robert
30 Robert
30 Robert
60 Robert
80 Robert
40 Robert
40 Robert
40 Robert
15 Mike
30 Mike
Upvotes: 0
Views: 656
Reputation: 34474
Assumptions:
numbers.dat
One bash
solution using a couple associative arrays to keep track of our numbers:
unset total count
declare -A total count
while read -r number name
do
(( total[${name}] += $number))
(( count[${name}] ++ ))
done < numbers.dat
typeset -p total count
This generates:
declare -A total=([Mike]="45" [Robert]="340" [John]="110" [Paul]="60" )
declare -A count=([Mike]="2" [Robert]="8" [John]="4" [Paul]="4" )
If we want integer based averages (ie, no decimal places):
for i in ${!total[@]}
do
printf "%-10s %5d / %-5d = %5d\n" "${i}" "${total[${i}]}" "${count[${i}]}" $(( ${total[${i}]} / ${count[${i}]} ))
done
This generates:
Mike 45 / 2 = 22
Robert 340 / 8 = 42
John 110 / 4 = 27
Paul 60 / 4 = 15
If we want the averages to include, say, 2 decimal places:
for i in ${!total[@]}
do
printf "%-10s %5d / %-5d = %5.2f\n" "${i}" "${total[${i}]}" "${count[${i}]}" $( bc <<< "scale=2;${total[${i}]} / ${count[${i}]}" )
done
This generates:
Mike 45 / 2 = 22.50
Robert 340 / 8 = 42.50
John 110 / 4 = 27.50
Paul 60 / 4 = 15.00
Output sorted by name:
for i in ${!total[@]}
do
printf "%-10s %5d / %-5d = %5.2f\n" "${i}" "${total[${i}]}" "${count[${i}]}" $( bc <<< "scale=2;${total[${i}]} / ${count[${i}]}" )
done | sort
This generates:
John 110 / 4 = 27.50
Mike 45 / 2 = 22.50
Paul 60 / 4 = 15.00
Robert 340 / 8 = 42.50
Upvotes: 0
Reputation: 34474
One awk
solution that prints averages to 2 decimal places and orders output by name:
awk '
{ total[$2]+=$1
count[$2]++
}
END { PROCINFO["sorted_in"]="@ind_str_asc"
for ( i in total )
printf "%-10s %5d / %-5d = %5.2f\n", i, total[i], count[i], total[i]/count[i]
}
' numbers.dat
This generates:
John 110 / 4 = 27.50
Mike 45 / 2 = 22.50
Paul 60 / 4 = 15.00
Robert 340 / 8 = 42.50
Upvotes: 1
Reputation: 35
Thanks a lot Raman, it worked... do you happen to know if would possible to perform a calculation on the same awk to get the average of each one? For example, John is 10+20+30+50 = 110, 110 / 4 = 27
Upvotes: 0
Reputation: 12877
awk '{ map[$2]+=$1 } END { for (i in map) { print i" "map[i] } }' file
Using awk, create an array with the name as the first index and a running total of the values for each name. At the end, print the names and totals.
Upvotes: 1