Reputation: 113
I have the 2 following files:
points:
John,12
Joseph,14
Madison,15
Elijah,14
Theodore,15
Regina,18
teams:
Theodore,team1
Elijah,team2
Madison,team1
Joseph,team3
Regina,team2
John,team3
I would like to calculate the average points of each team.
I came up with a solution using only 2 awk
statements. But I would like to do it in a more efficient way (without using for
loops and if
statements).
Here is what I did:
#!/bin/bash
awk 'BEGIN { FS="," }
FNR==NR { a[FNR] = $1; b[FNR] = $2; next } { for(i = 0; i <= NR; ++i) { if(a[i] == $1) print b[i], $2 } }' teams points > output.txt
In this first awk
command, I am separating the teams (team1, team2, team3) from the names and created a new file containing only my teams and the proper points for each team (and therefor the necessity of using a for loop
and an if statement
).
Secondly:
awk 'BEGIN { FS=" ";
count_team1 = 0;
count_team2 = 0;
count_team3 = 0
average_team1 = 0;
average_team2 = 0;
average_team3 = 0 }
/team1/ { count_team1 = count_team1 + 1; average_team1 = average_team1 + $2 }
/team2/ { count_team2 = count_team2 + 1; average_team2 = average_team2 + $2 }
/team3/ { count_team3 = count_team3 + 1; average_team3 = average_team3 + $2 }
END { print "The average of team1 is: " average_team1 / count_team1;
print "The average of team2 is: " average_team2 / count_team2;
print "The average of team3 is: " average_team3 / count_team3 }' output.txt
In this second awk
command, I am simply creating variables to store how many members of each team I have and other variables to have the total number of points of each team. I is easy to do since my new file output.txt
only contains the teams and the scores.
This solution is working but as I said before I would like to do it without using a for
loop and an if
statement. I thought of not using FNR==NR
and use grep -f
for matching but I didn't get any conclusive results.
Upvotes: 3
Views: 175
Reputation: 37394
Using awk only:
$ awk -F, '
NR==FNR { # process teams file
a[$1]=$2 # hash to a: a[name]=team
next
}
{ # process points file
b[a[$1]]+=$2 # add points to b, index on team: b[team]=pointsum
c[a[$1]]++ # add count to c, index on team: c[team]=count
}
END {
for(i in b)
print i,b[i]/c[i] # compute average
}' teams points
team1 15
team2 16
team3 13
Edit: A solution without a for
loop in the END
:
If the teams file is sorted on the team, you can avoid the for
loop in the END
. As a bonus the teams are outputed in order:
$ awk -F, '
NR==FNR { # process the points file
a[$1]=$2 # hash to a on name a[name]=points
next
}
{ # process the sorted teams file
if($2!=p && FNR>1) { # then the team changes
print p,b/c # its time to output team name and average
b=c=0 # reset counters
}
c++ # count
b+=a[$1] # sum of points for the team
p=$2 # p stores the team name for testing on the next round
}
END { # in the END
print p,b/c # print for the last team
}' points <(sort -t, -k2 teams)
team1 15
team2 16
team3 13
Upvotes: 3
Reputation: 2868
Give a try to this
awk -F, '
$2 ~ /^[0-9][0-9]*$/ {
team_sum[team[$1]]+=$2
team_score_count[team[$1]]++
next
}
{
team[$1]=$2
}
END {
for (team_name in team_sum)
print "The average of " team_name " is " (team_sum[team_name]/team_score_count[team_name])
}' teams points
The average of team1 is 15
The average of team2 is 14
The average of team3 is 13
Upvotes: 1