Reputation: 17
I want to find the average number of points scored by the team per game in each season.
Is there a easy way to consider both the case when the team won Wscore
and when the team lost Lscore
?
Season Daynum Wteam Wscore Lteam Lscore Wloc Numot
0 1985 20 1228 81 1328 64 N 0
1 1985 25 1106 77 1354 70 H 0
season
- this is the year of the associated entry in seasons.csv (the year in which the final tournament occurs)
daynum
- this integer always ranges from 0 to 132, and tells you what day the game was played on. It represents an offset from the dayzero
date in the seasons.csv
file. For example, the first game in the file was daynum=20. Combined with the fact from the season.csv
file that day zero was 10/29/1984, that means the first game was played 20 days later, or 11/18/1984. There are no teams that ever played more than one game on a given date, so you can use this fact if you need a unique key. In order to accomplish this uniqueness, we had to adjust one game's date. In March 2008, the SEC postseason tournament had to reschedule one game (Georgia-Kentucky) to a subsequent day, so Georgia had to actually play two games on the same day. In order to enforce this uniqueness, we moved the game date for the Georgia-Kentucky game back to its original date.
wteam
- this identifies the id number of the team that won the game, as listed in the teams.csv
file. No matter whether the game was won by the home team or visiting team, wteam
always identifies the winning team.
wscore
- this identifies the number of points scored by the winning team.
lteam
- this identifies the id number of the team that lost the game.
lscore
- this identifies the number of points scored by the losing team.
numot
- this indicates the number of overtime periods in the game, an integer 0 or higher.
wloc
- this identifies the location
of the winning team. If the winning team was the home team, this value will be H
. If the winning team was the visiting team, this value will be A
. If it was played on a neutral court, then this value will be N
. Sometimes it is unclear whether the site should be considered neutral, since it is near one team's home court, or even on their court during a tournament, but for this determination we have simply used the Kenneth Massey data in its current state, where the @
sign is either listed with the winning team, the losing team, or neither team.
Upvotes: 0
Views: 272
Reputation: 8508
Here's a way to go about doing this.
import pandas as pd
cols = ['Season','Daynum','Wteam','Wscore','Lteam','Lscore','Wloc','Numot']
colx = ['Season','Daynum','Team', 'Score']
dt = [[1985, 20, 1228, 81, 1328, 64, 'N', 0],
[1985, 20, 1354, 77, 1228, 70, 'H', 0],
[1985, 25, 1106, 77, 1354, 70, 'H', 0],
[1985, 25, 1328, 82, 1106, 78, 'N', 0],
[1985, 21, 1106, 77, 1228, 65, 'N', 0]]
df = pd.DataFrame(data=dt,columns=cols)
#take the first 4 columns into d1. Keep only Season, Wteam, Wscore
#rename Wteam and Wscore as Team and Score
d1 = df.iloc[:,:4]
d1 = d1.rename(columns={'Wteam': 'Team', 'Wscore': 'Score'})
d1 = d1.drop(columns = ['Daynum'])
#take the first 4 columns into d1. Keep only Season, Lteam, Lscore
#rename Lteam and Lscore as Team and Score
d2 = df.iloc[:,:6]
d2 = d2.drop(columns =['Daynum','Wteam','Wscore'])
d2 = d2.rename(columns={'Lteam': 'Team', 'Lscore': 'Score'})
#Now you have two dataframes that have common name but subsets of original dataframe
d3 = pd.concat([d1,d2])
#Group by Season and Team to get the average Score for each team by each Season
print(d3 = d3.groupby(['Season','Team']).mean())
#To show you the breakup of each set, I am printing all the dataframes
print(df)
print(d1)
print(d2)
print(d3)
DataFrame of Average Score by Season and Team
Score
Season Team
1985 1106 77.333333
1228 72.000000
1328 73.000000
1354 73.500000
Original Dataframe:
Season Daynum Wteam Wscore Lteam Lscore Wloc Numot
0 1985 20 1228 81 1328 64 N 0
1 1985 20 1354 77 1228 70 H 0
2 1985 25 1106 77 1354 70 H 0
3 1985 25 1328 82 1106 78 N 0
4 1985 21 1106 77 1228 65 N 0
DataFrame with only list of Win Teams
Season Team Score
0 1985 1228 81
1 1985 1354 77
2 1985 1106 77
3 1985 1328 82
4 1985 1106 77
DataFrame with only list of Loss Teams
Season Team Score
0 1985 1328 64
1 1985 1228 70
2 1985 1354 70
3 1985 1106 78
4 1985 1228 65
Merged DataFrame with list of both Win & Loss Teams
This will be the list you need to group by and calculate the average
Season Team Score
0 1985 1228 81
1 1985 1354 77
2 1985 1106 77
3 1985 1328 82
4 1985 1106 77
0 1985 1328 64
1 1985 1228 70
2 1985 1354 70
3 1985 1106 78
4 1985 1228 65
Upvotes: 1