Reputation: 258
df = pd.DataFrame({'Date': [402, 402, 402, 402, 403, 403, 404, 404],
'Team' : ['SFO', 'ARI', 'CUB', 'STL', 'NYY', 'SEA', 'OAK', 'LAA'],
'Final' :[4, 6, 2, 5, 7, 2, 1, 2]})
df_expected = pd.DataFrame({'Date': [402, 402, 402, 402, 403, 403, 404, 404],
'Team' : ['SFO', 'ARI', 'CUB', 'STL', 'NYY', 'SEA', 'OAK', 'LAA'],
'Final' :[4, 6, 2, 5, 7, 2, 1, 2],
'Win_Loss': [0,1,0,1,1,0,0,1],
'Run_diff': [-2,2,-3,3,5,-5,-1,1]})
I am trying to create two columns: Run_diff for the run differential and a binary win/loss column.
Best I have been able to do so far:
Sets up odd even column to try and group games for analysis
df['Test'] = 1
for i, j in enumerate(df['Final']):
if (i % 2) == 0:
df['Test'][i] = 'Even'
else:
df['Test'][i] = 'Odd'
Attempt to get scores in a row for easier plus/minus
df['Shift'] = df['Final'].shift(fill_value = 0)
Attempting to work with two created columns above
conditions = [(df['Test'] == 'Odd'),
(df['Test'] == 'Even')]
values = [df['Final'] - df['Shift'], 0]
df['Run_diff'] = np.select(conditions, values)
This works ok for any Odd columns, which is my attempt to group the rows as games. But I can't figure out how to get the Even columns to work.
You don't have to use my code, since it is not the most elegant. I am sure of that. I am more than willing to try and apply new/better techniques for this.
Thanks.
Upvotes: 2
Views: 79
Reputation: 35676
You can re-index by odd and even rows.
Calculate run_diff and win_loss for both teams in a single row.
Then return your data to its initial order by stacking and re-indexing.
import pandas as pd
df = pd.DataFrame({'Date': [402, 402, 402, 402, 403, 403, 404, 404],
'Team': ['SFO', 'ARI', 'CUB', 'STL', 'NYY', 'SEA', 'OAK', 'LAA'],
'Final': [4, 6, 2, 5, 7, 2, 1, 2]})
# Re Index Based on Odd And Even Rows
new_df = df.set_index([df.index // 2, df.index % 2]).unstack()
# Calculate Run Diff in Both Directions
new_df['Run_Diff', 0] = new_df['Final', 0] - new_df['Final', 1]
new_df['Run_Diff', 1] = new_df['Final', 1] - new_df['Final', 0]
# Calculate Win Loss in Both Directions
new_df['Win_Loss', 0] = (new_df['Run_Diff', 0] > 0).astype(int)
new_df['Win_Loss', 1] = (new_df['Run_Diff', 1] > 0).astype(int)
# Remove Multi Index, Change Column Order
new_df = (
new_df.stack(1)
.reset_index(drop=True)[['Date', 'Team', 'Final', 'Win_Loss', 'Run_Diff']]
)
print(new_df)
Output:
Date Team Final Win_Loss Run_Diff
0 402 SFO 4 0 -2
1 402 ARI 6 1 2
2 402 CUB 2 0 -3
3 402 STL 5 1 3
4 403 NYY 7 1 5
5 403 SEA 2 0 -5
6 404 OAK 1 0 -1
7 404 LAA 2 1 1
Upvotes: 1
Reputation: 195553
I'd chose different approach. Looking at your data, I'd first transform your dataframe:
new_df = pd.DataFrame(
{
"Date": df["Date"].iloc[::2].values,
"Team1": df["Team"].iloc[::2].values,
"Team2": df["Team"].iloc[1::2].values,
"Final1": df["Final"].iloc[::2].values,
"Final2": df["Final"].iloc[1::2].values,
}
)
This creates this new_df
:
Date Team1 Team2 Final1 Final2
0 402 SFO ARI 4 6
1 402 CUB STL 2 5
2 403 NYY SEA 7 2
3 404 OAK LAA 1 2
Then it's easy:
new_df["run_diff"] = new_df["Final1"] - new_df["Final2"]
new_df["win_loss"] = (new_df["run_diff"] < 0).astype(int)
print(new_df)
Prints:
Date Team1 Team2 Final1 Final2 run_diff win_loss
0 402 SFO ARI 4 6 -2 1
1 402 CUB STL 2 5 -3 1
2 403 NYY SEA 7 2 5 0
3 404 OAK LAA 1 2 -1 1
Upvotes: 1