Running Total Based on group by and sum of multiple columns pandas

Question

I am working with the below dataset and I am running into trouble calculating total score based on team id. A team can be either home or away and I am looking to calculate a running total of their score.

I have successfully created a running total based on home_id and away_id as well as a running average based on home/away id, but I am struggling to calculate based on both columns

For example if in game 1 home team scores 1 and then in game 2 they are the away team and score 3 I would like to create a column that says they have scored 4 in total up to this point in the dataset

My code so far is:

import pandas as pd

game_data = pd.read_csv('game_data.csv')

game_data['home_avg_home_games'] = game_data.groupby('home_id')['home_score'].transform(lambda x: x.rolling(165, min_periods = 0).mean())
game_data['home_avg_against_home_games'] = game_data.groupby('home_id')['away_score'].transform(lambda x: x.rolling(165, min_periods = 0).mean())

game_data['away_avg_away_games'] = game_data.groupby('away_id')['away_score'].transform(lambda x: x.rolling(165, min_periods = 0).mean())
game_data['away_avg_against_away_games'] = game_data.groupby('away_id')['home_score'].transform(lambda x: x.rolling(165, min_periods = 0).mean())

game_data['scored_home_total'] = game_data.groupby('home_id')['home_score'].cumsum()
game_data['scored_away_total'] = game_data.groupby('away_id')['away_score'].cumsum()

game_id	away_id	home_id	away_score	home_score	home_avg_home_games	home_avg_against_home_games	away_avg_away_games	away_avg_against_away_games
446877	138	134	1	4	4	1	1	4
446911	141	139	5	3	3	5	5	3
446873	121	118	3	4	4	3	3	4
446875	137	158	12	3	3	12	12	3
446872	142	110	2	3	3	2	2	3
446876	136	140	2	3	3	2	2	3
446874	143	113	2	6	6	2	2	6
446879	120	144	4	3	3	4	4	3
446871	119	135	15	0	0	15	15	0
446878	141	139	5	3	3	5	5	3
446869	115	109	10	5	5	10	10	5
446889	112	108	9	0	0	9	9	0
446868	145	133	4	3	3	4	4	3
446870	117	147	5	3	3	5	5	3
446867	111	114	6	2	2	6	6	2
446896	121	118	2	0	2	2.5	2.5	2
446910	138	134	5	6	5	3	3	5
446887	141	139	2	3	3	4	4	3
446883	116	146	8	7	7	8	8	7
446886	136	140	10	2	2.5	6	6	2.5
446885	137	158	2	1	2	7	7	2
446882	115	109	6	11	8	8	8	8
446880	112	108	6	1	0.5	7.5	7.5	0.5
446881	145	133	5	4	3.5	4.5	4.5	3.5
446884	119	135	3	0	0	9	9	0
446901	141	139	3	5	3.5	3.75	3.75	3.5
446898	137	158	3	4	2.666666667	5.666666667	5.666666667	2.666666667
446899	136	140	9	5	3.333333333	7	7	3.333333333
446891	115	109	4	3	6.333333333	6.666666667	6.666666667	6.333333333

My desired output is:

game_id	away_id	home_id	away_score	home_score	home_avg_for_home_games	home_avg_against_home_games	away_avg_away_games	away_avg_against_away_games	home_total_score	away_total_score
446877	1	2	1	4	4	1	1	4	4	1
446911	2	3	5	3	3	5	5	3	3	5
446873	1	3	3	4	3.5	4	2	4	7	4

msamedozmen · Accepted Answer

This will help you to solve your problem. However it gives warning for avoid warning u need to use include_groups=False however if u that it remove home_id and away_id

game_data = game_data.groupby('home_id',group_keys=False,as_index=False).apply(lambda x: x.assign(home_total_score=x['home_score'].cumsum()),include_groups=True)
game_data= game_data.groupby('away_id',group_keys=False,as_index=False).apply(lambda x: x.assign(away_total_score=x['away_score'].cumsum()),include_groups=True)

I have solved that issue with like that I created new away_id with using np.where(). So you need to add import numpy as np in your code.

new = game_data.groupby('home_id',group_keys=False,as_index=False).apply(lambda x: x.assign(home_total_score=x['home_score'].cumsum()),include_groups=False)
new= game_data.groupby('away_id',group_keys=False,as_index=False).apply(lambda x: x.assign(away_total_score=x['away_score'].cumsum()),include_groups=False)

new["away_id"] = np.where(new["game_id"] == game_data["game_id"],game_data["away_id"],None)

For reindex columns add these line codes. Becasue we created new column and it added into last index.

columns = new.columns.tolist()

columns.insert(columns.index('away_score'), columns.pop(columns.index('away_id')))

new = new.reindex(columns=columns)

Running Total Based on group by and sum of multiple columns pandas

Answers (1)

Related Questions