V3NT0
V3NT0

Reputation: 35

Shell script to sum columns associated with a name

I have a file with thousands of numbers on column 1 and each sequence of these numbers are associated with a single person. Would someone have any idea on how can I create a shell script to sum column 1 for that specific person, eg:

John is 10+20+30+50 = 110

Output of the script would be: John 110 and so on and so forth..

I have tried with while, for, etc but I can't associate the sum to the person :(

Example of the file:

10 John
20 John
30 John
50 John
10 Paul
10 Paul
20 Paul
20 Paul
20 Robert
30 Robert
30 Robert 
60 Robert 
80 Robert
40 Robert
40 Robert
40 Robert
15 Mike
30 Mike

Upvotes: 0

Views: 656

Answers (4)

markp-fuso
markp-fuso

Reputation: 34474

Assumptions:

  • data resides in a file named numbers.dat
  • we'll store totals and counts in arrays but calculate averages simply for display (OP can decide if averages should also be stored in an array)

One bash solution using a couple associative arrays to keep track of our numbers:

unset      total count
declare -A total count

while read -r number name
do
    (( total[${name}] += $number))
    (( count[${name}] ++ ))
done < numbers.dat

typeset -p total count

This generates:

declare -A total=([Mike]="45" [Robert]="340" [John]="110" [Paul]="60" )
declare -A count=([Mike]="2" [Robert]="8" [John]="4" [Paul]="4" )

If we want integer based averages (ie, no decimal places):

for i in ${!total[@]}
do
   printf "%-10s %5d / %-5d = %5d\n" "${i}" "${total[${i}]}" "${count[${i}]}" $(( ${total[${i}]} / ${count[${i}]} ))
done

This generates:

Mike          45 / 2     =    22
Robert       340 / 8     =    42
John         110 / 4     =    27
Paul          60 / 4     =    15

If we want the averages to include, say, 2 decimal places:

for i in ${!total[@]}
do
   printf "%-10s %5d / %-5d = %5.2f\n" "${i}" "${total[${i}]}" "${count[${i}]}" $( bc <<< "scale=2;${total[${i}]} / ${count[${i}]}" )
done

This generates:

Mike          45 / 2     = 22.50
Robert       340 / 8     = 42.50
John         110 / 4     = 27.50
Paul          60 / 4     = 15.00

Output sorted by name:

for i in ${!total[@]}
do
   printf "%-10s %5d / %-5d = %5.2f\n" "${i}" "${total[${i}]}" "${count[${i}]}" $( bc <<< "scale=2;${total[${i}]} / ${count[${i}]}" )
done | sort

This generates:

John         110 / 4     = 27.50
Mike          45 / 2     = 22.50
Paul          60 / 4     = 15.00
Robert       340 / 8     = 42.50

Upvotes: 0

markp-fuso
markp-fuso

Reputation: 34474

One awk solution that prints averages to 2 decimal places and orders output by name:

awk '
    { total[$2]+=$1
      count[$2]++
    }
END { PROCINFO["sorted_in"]="@ind_str_asc"
      for ( i in total )
          printf "%-10s %5d / %-5d = %5.2f\n", i, total[i], count[i], total[i]/count[i]
    }
' numbers.dat

This generates:

John         110 / 4     = 27.50
Mike          45 / 2     = 22.50
Paul          60 / 4     = 15.00
Robert       340 / 8     = 42.50

Upvotes: 1

V3NT0
V3NT0

Reputation: 35

Thanks a lot Raman, it worked... do you happen to know if would possible to perform a calculation on the same awk to get the average of each one? For example, John is 10+20+30+50 = 110, 110 / 4 = 27

Upvotes: 0

Raman Sailopal
Raman Sailopal

Reputation: 12877

awk '{ map[$2]+=$1 } END { for (i in map) { print i" "map[i] } }' file

Using awk, create an array with the name as the first index and a running total of the values for each name. At the end, print the names and totals.

Upvotes: 1

Related Questions