user2333196
user2333196

Reputation: 5796

calculating running totals in separate columns in Pandas DF

I have a pandas dataframe like this.

frame =  pd.DataFrame({'home' : ['CHI', 'ATL', 'SEA', 'DET', 'STL','HOU' ,'CHI','CHI'],
                   'away' : ['DET', 'CHI', 'HOU', 'TOR', 'DAL', 'STL', 'MIA', 'SEA']})

Thanks to unutbu I can keep a running total of total games played for each team like this.

awayGP = collections.Counter()
homeGP = collections.Counter()

def count_games():
    for idx, row in frame.iterrows():
        homeGP[row['home']] +=1
        awayGP[row['away']] +=1
        test = homeGP + awayGP
        yield awayGP[row['away']], awayGP[row['home']], , homeGP[row['away']], homeGP[row['home']]

frame['awayteamAwayGP'] , frame['hometeamAwayGP'], frame['awayteamHomeGP'],          frame['hometeamHomeGP'] = zip(*list(count_games()))                                                             
frame['awayteamGames'] =   frame['awayteamAwayGP'] + frame['awayteamHomeGP']                                                                                       
frame['hometeamGames'] =   frame['hometeamAwayGP'] + frame['hometeamHomeGP']        
del frame['awayteamAwayGP'] , frame['hometeamAwayGP'], frame['awayteamHomeGP'],    frame['hometeamHomeGP']  

I would like to be able to keep running total of the points scored for each team.

frame['awayPTS'] = [88, 75, 105, 99, 110, 85, 95, 100]
frame['homePTS'] = [92, 88, 95, 97, 100, 74, 98, 110]

This is the desired output.

away  home  awayteamGP  hometeamGP awayPTS  homePTS awayteam_totalPTS hometeam_totalPTS 
 DET   CHI     1             1       88        92          88               92
 CHI   ATL     2             1       75        88          180              88
 HOU   SEA     1             1       105       95          105              95
 TOR   DET     1             2       99        97          99               187
 DAL   STL     1             1       110       100         110              100
 STL   HOU     2             2       85        74          185              179
 MIA   CHI     1             3       95        98          95               265
 SEA   CHI     2             4       100       110         195              375

Upvotes: 2

Views: 2153

Answers (2)

Andy Hayden
Andy Hayden

Reputation: 375685

I think it makes sense to do a groupby and then cumsum each group. It's worth noting that this method will be significantly faster, than the Counter/defaultdict solution, when you have more items in your table (I see it twice as fast by 100 rows and fifty times faster with 10000 rows).

First we have to stack in such a way we can do this independently (of away/home):

In [10]: frame.columns = [['away', 'away', 'home', 'home'],
                          ['team', 'PTS', 'team', 'PTS']]

In [11]: frame  # with nice descriptive column labels
Out[11]: 
  away  away home  home
  team   PTS team   PTS
0  DET    88  CHI    92
1  CHI    75  ATL    88
2  HOU   105  SEA    95
3  TOR    99  DET    97
4  DAL   110  STL   100
5  STL    85  HOU    74
6  MIA    95  CHI    98
7  SEA   100  CHI   110

In [12]: frame_stacked = frame.stack(0)

In [13]: frame_stacked
Out[13]: 
        PTS team
0 away   88  DET
  home   92  CHI
1 away   75  CHI
  home   88  ATL
2 away  105  HOU
  home   95  SEA
3 away   99  TOR
  home   97  DET
4 away  110  DAL
  home  100  STL
5 away   85  STL
  home   74  HOU
6 away   95  MIA
  home   98  CHI
7 away  100  SEA
  home  110  CHI

Now we can groupby the team here (and cumsum will include both their away and home games):

In [14]: total_pts = frame_stacked.groupby('team')['PTS'].cumsum()

In [15]: total_pts
Out[15]: 
0  away     88
   home     92
1  away    167
   home     88
2  away    105
   home     95
3  away     99
   home    185
4  away    110
   home    100
5  away    185
   home    179
6  away     95
   home    265
7  away    195
   home    375
dtype: int64

Finally, we just have to insert these into the frame with the correctly named columns:

In [16]: frame[('home', 'totalPTS')] = total_pts[:, 'home']

In [17]: frame[('away', 'totalPTS')] = total_pts[:, 'away']

In [18]: frame
Out[18]: 
  away  away home  home     away     home
  team   PTS team   PTS totalPTS totalPTS
0  DET    88  CHI    92       88       92
1  CHI    75  ATL    88      167       88
2  HOU   105  SEA    95      105       95
3  TOR    99  DET    97       99      185
4  DAL   110  STL   100      110      100
5  STL    85  HOU    74      185      179
6  MIA    95  CHI    98       95      265
7  SEA   100  CHI   110      195      375

Upvotes: 4

Viktor Kerkez
Viktor Kerkez

Reputation: 46616

Create a defualtdict (with default value 0) where you will keep the current scores of the teams, and apply along the axis=1 a function that updates this dictionary and returns a tuple of results. Then just concatenate your DataFrame and the resulting DataFrame from the apply function along axis=1.

frame =  pd.DataFrame({
    'home'    : ['CHI', 'ATL', 'SEA', 'DET', 'STL','HOU' ,'CHI','CHI'],
    'away'    : ['DET', 'CHI', 'HOU', 'TOR', 'DAL', 'STL', 'MIA', 'SEA'],
    'awayPTS' : [88, 75, 105, 99, 110, 85, 95, 100],
    'homePTS' : [92, 88, 95, 97, 100, 74, 98, 110],
})

score = collections.defaultdict(int)
def calculate(row):
    away = row['away']
    home = row['home']
    score[away] += row['awayPTS']
    score[home] += row['homePTS']
    return pd.Series([score[away], score[home]],
                     index=['awayteam_totalPTS', 'hometeam_totalPTS'])
frame = pd.concat([frame, frame.apply(calculate, axis=1)], axis=1)

Gives:

  away home  awayPTS  homePTS  awayteam_totalPTS  hometeam_totalPTS
0  DET  CHI       88       92                 88                 92
1  CHI  ATL       75       88                167                 88
2  HOU  SEA      105       95                105                 95
3  TOR  DET       99       97                 99                185
4  DAL  STL      110      100                110                100
5  STL  HOU       85       74                185                179
6  MIA  CHI       95       98                 95                265
7  SEA  CHI      100      110                195                375

Upvotes: 3

Related Questions