Reputation: 5796
I have a pandas dataframe like this.
frame = pd.DataFrame({'home' : ['CHI', 'ATL', 'SEA', 'DET', 'STL','HOU' ,'CHI','CHI'],
'away' : ['DET', 'CHI', 'HOU', 'TOR', 'DAL', 'STL', 'MIA', 'SEA']})
Thanks to unutbu I can keep a running total of total games played for each team like this.
awayGP = collections.Counter()
homeGP = collections.Counter()
def count_games():
for idx, row in frame.iterrows():
homeGP[row['home']] +=1
awayGP[row['away']] +=1
test = homeGP + awayGP
yield awayGP[row['away']], awayGP[row['home']], , homeGP[row['away']], homeGP[row['home']]
frame['awayteamAwayGP'] , frame['hometeamAwayGP'], frame['awayteamHomeGP'], frame['hometeamHomeGP'] = zip(*list(count_games()))
frame['awayteamGames'] = frame['awayteamAwayGP'] + frame['awayteamHomeGP']
frame['hometeamGames'] = frame['hometeamAwayGP'] + frame['hometeamHomeGP']
del frame['awayteamAwayGP'] , frame['hometeamAwayGP'], frame['awayteamHomeGP'], frame['hometeamHomeGP']
I would like to be able to keep running total of the points scored for each team.
frame['awayPTS'] = [88, 75, 105, 99, 110, 85, 95, 100]
frame['homePTS'] = [92, 88, 95, 97, 100, 74, 98, 110]
This is the desired output.
away home awayteamGP hometeamGP awayPTS homePTS awayteam_totalPTS hometeam_totalPTS
DET CHI 1 1 88 92 88 92
CHI ATL 2 1 75 88 180 88
HOU SEA 1 1 105 95 105 95
TOR DET 1 2 99 97 99 187
DAL STL 1 1 110 100 110 100
STL HOU 2 2 85 74 185 179
MIA CHI 1 3 95 98 95 265
SEA CHI 2 4 100 110 195 375
Upvotes: 2
Views: 2153
Reputation: 375685
I think it makes sense to do a groupby
and then cumsum
each group. It's worth noting that this method will be significantly faster, than the Counter/defaultdict solution, when you have more items in your table (I see it twice as fast by 100 rows and fifty times faster with 10000 rows).
First we have to stack
in such a way we can do this independently (of away/home):
In [10]: frame.columns = [['away', 'away', 'home', 'home'],
['team', 'PTS', 'team', 'PTS']]
In [11]: frame # with nice descriptive column labels
Out[11]:
away away home home
team PTS team PTS
0 DET 88 CHI 92
1 CHI 75 ATL 88
2 HOU 105 SEA 95
3 TOR 99 DET 97
4 DAL 110 STL 100
5 STL 85 HOU 74
6 MIA 95 CHI 98
7 SEA 100 CHI 110
In [12]: frame_stacked = frame.stack(0)
In [13]: frame_stacked
Out[13]:
PTS team
0 away 88 DET
home 92 CHI
1 away 75 CHI
home 88 ATL
2 away 105 HOU
home 95 SEA
3 away 99 TOR
home 97 DET
4 away 110 DAL
home 100 STL
5 away 85 STL
home 74 HOU
6 away 95 MIA
home 98 CHI
7 away 100 SEA
home 110 CHI
Now we can groupby the team here (and cumsum will include both their away and home games):
In [14]: total_pts = frame_stacked.groupby('team')['PTS'].cumsum()
In [15]: total_pts
Out[15]:
0 away 88
home 92
1 away 167
home 88
2 away 105
home 95
3 away 99
home 185
4 away 110
home 100
5 away 185
home 179
6 away 95
home 265
7 away 195
home 375
dtype: int64
Finally, we just have to insert these into the frame with the correctly named columns:
In [16]: frame[('home', 'totalPTS')] = total_pts[:, 'home']
In [17]: frame[('away', 'totalPTS')] = total_pts[:, 'away']
In [18]: frame
Out[18]:
away away home home away home
team PTS team PTS totalPTS totalPTS
0 DET 88 CHI 92 88 92
1 CHI 75 ATL 88 167 88
2 HOU 105 SEA 95 105 95
3 TOR 99 DET 97 99 185
4 DAL 110 STL 100 110 100
5 STL 85 HOU 74 185 179
6 MIA 95 CHI 98 95 265
7 SEA 100 CHI 110 195 375
Upvotes: 4
Reputation: 46616
Create a defualtdict
(with default value 0) where you will keep the current scores of the teams, and apply along the axis=1
a function that updates this dictionary and returns a tuple of results. Then just concatenate your DataFrame and the resulting DataFrame from the apply
function along axis=1
.
frame = pd.DataFrame({
'home' : ['CHI', 'ATL', 'SEA', 'DET', 'STL','HOU' ,'CHI','CHI'],
'away' : ['DET', 'CHI', 'HOU', 'TOR', 'DAL', 'STL', 'MIA', 'SEA'],
'awayPTS' : [88, 75, 105, 99, 110, 85, 95, 100],
'homePTS' : [92, 88, 95, 97, 100, 74, 98, 110],
})
score = collections.defaultdict(int)
def calculate(row):
away = row['away']
home = row['home']
score[away] += row['awayPTS']
score[home] += row['homePTS']
return pd.Series([score[away], score[home]],
index=['awayteam_totalPTS', 'hometeam_totalPTS'])
frame = pd.concat([frame, frame.apply(calculate, axis=1)], axis=1)
Gives:
away home awayPTS homePTS awayteam_totalPTS hometeam_totalPTS
0 DET CHI 88 92 88 92
1 CHI ATL 75 88 167 88
2 HOU SEA 105 95 105 95
3 TOR DET 99 97 99 185
4 DAL STL 110 100 110 100
5 STL HOU 85 74 185 179
6 MIA CHI 95 98 95 265
7 SEA CHI 100 110 195 375
Upvotes: 3