Nate B
Nate B

Reputation: 27

Running Total Based on group by and sum of multiple columns pandas

I am working with the below dataset and I am running into trouble calculating total score based on team id. A team can be either home or away and I am looking to calculate a running total of their score.

I have successfully created a running total based on home_id and away_id as well as a running average based on home/away id, but I am struggling to calculate based on both columns

For example if in game 1 home team scores 1 and then in game 2 they are the away team and score 3 I would like to create a column that says they have scored 4 in total up to this point in the dataset

My code so far is:

import pandas as pd

game_data = pd.read_csv('game_data.csv')

game_data['home_avg_home_games'] = game_data.groupby('home_id')['home_score'].transform(lambda x: x.rolling(165, min_periods = 0).mean())
game_data['home_avg_against_home_games'] = game_data.groupby('home_id')['away_score'].transform(lambda x: x.rolling(165, min_periods = 0).mean())

game_data['away_avg_away_games'] = game_data.groupby('away_id')['away_score'].transform(lambda x: x.rolling(165, min_periods = 0).mean())
game_data['away_avg_against_away_games'] = game_data.groupby('away_id')['home_score'].transform(lambda x: x.rolling(165, min_periods = 0).mean())

game_data['scored_home_total'] = game_data.groupby('home_id')['home_score'].cumsum()
game_data['scored_away_total'] = game_data.groupby('away_id')['away_score'].cumsum()
game_id away_id home_id away_score home_score home_avg_home_games home_avg_against_home_games away_avg_away_games away_avg_against_away_games
446877 138 134 1 4 4 1 1 4
446911 141 139 5 3 3 5 5 3
446873 121 118 3 4 4 3 3 4
446875 137 158 12 3 3 12 12 3
446872 142 110 2 3 3 2 2 3
446876 136 140 2 3 3 2 2 3
446874 143 113 2 6 6 2 2 6
446879 120 144 4 3 3 4 4 3
446871 119 135 15 0 0 15 15 0
446878 141 139 5 3 3 5 5 3
446869 115 109 10 5 5 10 10 5
446889 112 108 9 0 0 9 9 0
446868 145 133 4 3 3 4 4 3
446870 117 147 5 3 3 5 5 3
446867 111 114 6 2 2 6 6 2
446896 121 118 2 0 2 2.5 2.5 2
446910 138 134 5 6 5 3 3 5
446887 141 139 2 3 3 4 4 3
446883 116 146 8 7 7 8 8 7
446886 136 140 10 2 2.5 6 6 2.5
446885 137 158 2 1 2 7 7 2
446882 115 109 6 11 8 8 8 8
446880 112 108 6 1 0.5 7.5 7.5 0.5
446881 145 133 5 4 3.5 4.5 4.5 3.5
446884 119 135 3 0 0 9 9 0
446901 141 139 3 5 3.5 3.75 3.75 3.5
446898 137 158 3 4 2.666666667 5.666666667 5.666666667 2.666666667
446899 136 140 9 5 3.333333333 7 7 3.333333333
446891 115 109 4 3 6.333333333 6.666666667 6.666666667 6.333333333

My desired output is:

game_id away_id home_id away_score home_score home_avg_for_home_games home_avg_against_home_games away_avg_away_games away_avg_against_away_games home_total_score away_total_score
446877 1 2 1 4 4 1 1 4 4 1
446911 2 3 5 3 3 5 5 3 3 5
446873 1 3 3 4 3.5 4 2 4 7 4

Upvotes: 1

Views: 122

Answers (1)

msamedozmen
msamedozmen

Reputation: 1129

This will help you to solve your problem. However it gives warning for avoid warning u need to use include_groups=False however if u that it remove home_id and away_id

game_data = game_data.groupby('home_id',group_keys=False,as_index=False).apply(lambda x: x.assign(home_total_score=x['home_score'].cumsum()),include_groups=True)
game_data= game_data.groupby('away_id',group_keys=False,as_index=False).apply(lambda x: x.assign(away_total_score=x['away_score'].cumsum()),include_groups=True)

I have solved that issue with like that I created new away_id with using np.where(). So you need to add import numpy as np in your code.

new = game_data.groupby('home_id',group_keys=False,as_index=False).apply(lambda x: x.assign(home_total_score=x['home_score'].cumsum()),include_groups=False)
new= game_data.groupby('away_id',group_keys=False,as_index=False).apply(lambda x: x.assign(away_total_score=x['away_score'].cumsum()),include_groups=False)

new["away_id"] = np.where(new["game_id"] == game_data["game_id"],game_data["away_id"],None)

For reindex columns add these line codes. Becasue we created new column and it added into last index.

columns = new.columns.tolist()

columns.insert(columns.index('away_score'), columns.pop(columns.index('away_id')))

new = new.reindex(columns=columns)

Upvotes: 1

Related Questions