Reputation: 797
I have a data-set that is in the shape of this, where each row represents a in a specific match that is specified by the gameID
.
gameID Won/Lost Home Away metric2 metric3 metric4 team1 team2 team3 team4
2017020001 1 1 0 10 10 10 1 0 0 0
2017020001 0 0 1 10 10 10 0 1 0 0
The thing I want to do is create a function that takes the rows with the same gameID
and joins them. As you can see in data example below, the two rows represents one game that is split up into a home team (row_1 ) and an away team (row_2). I want these two rows to sit on one row only.
Won/Lost h_metric2 h_metric3 h_metric4 a_metric2 a_metric3 a_metric4 h_team1 h_team2 h_team3 h_team4 a_team1 a_team2 a_team3 a_team4
1 10 10 10 10 10 10 1 0 0 0 0 1 0 0
How do I get this result?
EDIT: I created too much confusion, posting my code so you can get a better grasp of the problem I want to solve.
import numpy as np
import pandas as pd
import requests
import json
from sklearn import preprocessing
from sklearn.preprocessing import OneHotEncoder
results = []
for game_id in range(2017020001, 2017020010, 1):
url = 'https://statsapi.web.nhl.com/api/v1/game/{}/boxscore'.format(game_id)
r = requests.get(url)
game_data = r.json()
for homeaway in ['home','away']:
game_dict = game_data.get('teams').get(homeaway).get('teamStats').get('teamSkaterStats')
game_dict['team'] = game_data.get('teams').get(homeaway).get('team').get('name')
game_dict['homeaway'] = homeaway
game_dict['game_id'] = game_id
results.append(game_dict)
df = pd.DataFrame(results)
df['Won/Lost'] = df.groupby('game_id')['goals'].apply(lambda g: (g == g.max()).map({True: 1, False: 0}))
df["faceOffWinPercentage"] = df["faceOffWinPercentage"].astype('float')
df["powerPlayPercentage"] = df["powerPlayPercentage"].astype('float')
df["team"] = df["team"].astype('category')
df = pd.get_dummies(df, columns=['homeaway'])
df = pd.get_dummies(df, columns=['team'])
Upvotes: 1
Views: 5011
Reputation: 78700
This is under the assumption that you have exactly two rows per gameID
and that you want to group by that ID. (It also assumes that I understand the question.)
Improved solution
Given a dataframe df
such as
gameID Won/Lost Home Away metric2 metric3 metric4 team1 team2 team3 team4
0 2017020001 1 1 0 10 10 10 1 0 0 0
1 2017020001 0 0 1 10 10 10 0 1 0 0
2 2017020002 1 1 0 10 10 10 1 0 0 0
3 2017020002 0 0 1 10 10 10 0 1 0 0
you can use pd.merge
(and some data munging) like this:
>>> is_home = df['Home'] == 1
>>> home = df[is_home].drop(['Home', 'Away'], axis=1).add_prefix('h_').rename(columns={'h_gameID':'gameID'})
>>> away = df[~is_home].drop(['Won/Lost', 'Home', 'Away'], axis=1).add_prefix('a_').rename(columns={'a_gameID':'gameID'})
>>> pd.merge(home, away, on='gameID')
gameID h_Won/Lost h_metric2 h_metric3 h_metric4 h_team1 h_team2 h_team3 h_team4 a_metric2 a_metric3 a_metric4 a_team1 a_team2 a_team3 a_team4
0 2017020001 1 10 10 10 1 0 0 0 10 10 10 0 1 0 0
1 2017020002 1 10 10 10 1 0 0 0 10 10 10 0 1 0 0
(I kept the prefix for Won/Lost
because it indicates that it's the statistic for the home team. Also, if anybody knows how to add the prefixes more elegantly without having to re-rename the gameID
please leave a comment.)
Original Attempt
You can apply the following function after grouping
def munge(group):
is_home = group.Home == 1
wonlost = group.loc[is_home, 'Won/Lost'].reset_index(drop=True)
group = group.loc[:, 'metric2':]
home = group[is_home].add_prefix('h_').reset_index(drop=True)
away = group[~is_home].add_prefix('a_').reset_index(drop=True)
return pd.concat([wonlost, home, away], axis=1)
... like this:
>>> df.groupby('gameID').apply(munge).reset_index(level=1, drop=True)
Won/Lost h_metric2 h_metric3 h_metric4 h_team1 h_team2 h_team3 h_team4 a_metric2 a_metric3 a_metric4 a_team1 a_team2 a_team3 a_team4
gameID
2017020001 1 10 10 10 1 0 0 0 10 10 10 0 1 0 0
2017020002 1 10 10 10 1 0 0 0 10 10 10 0 1 0 0
Upvotes: 1
Reputation: 11
i just suppose, you are working with bread and butter: numpy, pandas & co?
if so, i furthermore assume, that your table currently is being stored in a pandas.DataFrame-instance called 'df':
Divide your df into two df's and then join them:
df_team1 = df[df['Won/Lost']==1]
df_team2 = df[df['Won/Lost']==0]
final_df = df_team1.join(df_team2, lsuffix='_team1', rsuffix='_team2', on='gameID')
You can, of course, edit it to better match your purposes. For instance create the df's based on Home/Away columns, etc.
BR Ben :]
Upvotes: 1