furry Nifty
furry Nifty

Reputation: 17

Pandas Dataframe simple Stats

I want to find the average number of points scored by the team per game in each season.

Is there a easy way to consider both the case when the team won Wscore and when the team lost Lscore?

    Season  Daynum  Wteam   Wscore  Lteam   Lscore  Wloc    Numot
0   1985    20     1228     81      1328       64     N      0
1   1985    25     1106     77      1354       70     H      0

season - this is the year of the associated entry in seasons.csv (the year in which the final tournament occurs)

daynum - this integer always ranges from 0 to 132, and tells you what day the game was played on. It represents an offset from the dayzero date in the seasons.csv file. For example, the first game in the file was daynum=20. Combined with the fact from the season.csv file that day zero was 10/29/1984, that means the first game was played 20 days later, or 11/18/1984. There are no teams that ever played more than one game on a given date, so you can use this fact if you need a unique key. In order to accomplish this uniqueness, we had to adjust one game's date. In March 2008, the SEC postseason tournament had to reschedule one game (Georgia-Kentucky) to a subsequent day, so Georgia had to actually play two games on the same day. In order to enforce this uniqueness, we moved the game date for the Georgia-Kentucky game back to its original date.

wteam - this identifies the id number of the team that won the game, as listed in the teams.csv file. No matter whether the game was won by the home team or visiting team, wteam always identifies the winning team.

wscore - this identifies the number of points scored by the winning team.

lteam - this identifies the id number of the team that lost the game.

lscore - this identifies the number of points scored by the losing team.

numot - this indicates the number of overtime periods in the game, an integer 0 or higher.

wloc - this identifies the location of the winning team. If the winning team was the home team, this value will be H. If the winning team was the visiting team, this value will be A. If it was played on a neutral court, then this value will be N. Sometimes it is unclear whether the site should be considered neutral, since it is near one team's home court, or even on their court during a tournament, but for this determination we have simply used the Kenneth Massey data in its current state, where the @ sign is either listed with the winning team, the losing team, or neither team.

Upvotes: 0

Views: 272

Answers (1)

Joe Ferndz
Joe Ferndz

Reputation: 8508

Here's a way to go about doing this.

import pandas as pd
cols = ['Season','Daynum','Wteam','Wscore','Lteam','Lscore','Wloc','Numot']
colx = ['Season','Daynum','Team', 'Score']
dt   = [[1985, 20, 1228, 81, 1328, 64, 'N', 0],
        [1985, 20, 1354, 77, 1228, 70, 'H', 0],
        [1985, 25, 1106, 77, 1354, 70, 'H', 0],
        [1985, 25, 1328, 82, 1106, 78, 'N', 0],
        [1985, 21, 1106, 77, 1228, 65, 'N', 0]]

df = pd.DataFrame(data=dt,columns=cols)

#take the first 4 columns into d1. Keep only Season, Wteam, Wscore
#rename Wteam and Wscore as Team and Score
d1 = df.iloc[:,:4]
d1 = d1.rename(columns={'Wteam': 'Team', 'Wscore': 'Score'})
d1 = d1.drop(columns = ['Daynum'])

#take the first 4 columns into d1. Keep only Season, Lteam, Lscore
#rename Lteam and Lscore as Team and Score
d2 = df.iloc[:,:6]
d2 = d2.drop(columns =['Daynum','Wteam','Wscore'])
d2 = d2.rename(columns={'Lteam': 'Team', 'Lscore': 'Score'})

#Now you have two dataframes that have common name but subsets of original dataframe
d3 = pd.concat([d1,d2])

#Group by Season and Team to get the average Score for each team by each Season
print(d3 = d3.groupby(['Season','Team']).mean())

#To show you the breakup of each set, I am printing all the dataframes
print(df)
print(d1)
print(d2)
print(d3)

DataFrame of Average Score by Season and Team

                 Score
Season Team           
1985   1106  77.333333
       1228  72.000000
       1328  73.000000
       1354  73.500000

Original Dataframe:

   Season  Daynum  Wteam  Wscore  Lteam  Lscore Wloc  Numot
0    1985      20   1228      81   1328      64    N      0
1    1985      20   1354      77   1228      70    H      0
2    1985      25   1106      77   1354      70    H      0
3    1985      25   1328      82   1106      78    N      0
4    1985      21   1106      77   1228      65    N      0

DataFrame with only list of Win Teams

   Season  Team  Score
0    1985  1228     81
1    1985  1354     77
2    1985  1106     77
3    1985  1328     82
4    1985  1106     77

DataFrame with only list of Loss Teams

   Season  Team  Score
0    1985  1328     64
1    1985  1228     70
2    1985  1354     70
3    1985  1106     78
4    1985  1228     65

Merged DataFrame with list of both Win & Loss Teams

This will be the list you need to group by and calculate the average

   Season  Team  Score
0    1985  1228     81
1    1985  1354     77
2    1985  1106     77
3    1985  1328     82
4    1985  1106     77
0    1985  1328     64
1    1985  1228     70
2    1985  1354     70
3    1985  1106     78
4    1985  1228     65

Upvotes: 1

Related Questions