chasedcribbet
chasedcribbet

Reputation: 258

Pandas compares scores in row for baseball games

df = pd.DataFrame({'Date': [402, 402, 402, 402, 403, 403, 404, 404],
                   'Team' : ['SFO', 'ARI', 'CUB', 'STL', 'NYY', 'SEA', 'OAK', 'LAA'],
                   'Final' :[4, 6, 2, 5, 7, 2, 1, 2]})

df_expected =  pd.DataFrame({'Date': [402, 402, 402, 402, 403, 403, 404, 404],
                       'Team' : ['SFO', 'ARI', 'CUB', 'STL', 'NYY', 'SEA', 'OAK', 'LAA'],
                       'Final' :[4, 6, 2, 5, 7, 2, 1, 2], 
                       'Win_Loss': [0,1,0,1,1,0,0,1],
                       'Run_diff': [-2,2,-3,3,5,-5,-1,1]})

I am trying to create two columns: Run_diff for the run differential and a binary win/loss column.

Best I have been able to do so far:

Sets up odd even column to try and group games for analysis

df['Test'] = 1
for i, j in enumerate(df['Final']):
    if (i % 2) == 0: 
        df['Test'][i] = 'Even'
    else: 
        df['Test'][i] = 'Odd'
  

Attempt to get scores in a row for easier plus/minus

df['Shift'] = df['Final'].shift(fill_value = 0)

Attempting to work with two created columns above

conditions = [(df['Test'] == 'Odd'),
             (df['Test'] == 'Even')]

values = [df['Final'] - df['Shift'], 0]

df['Run_diff'] = np.select(conditions, values)

This works ok for any Odd columns, which is my attempt to group the rows as games. But I can't figure out how to get the Even columns to work.

You don't have to use my code, since it is not the most elegant. I am sure of that. I am more than willing to try and apply new/better techniques for this.

Thanks.

Upvotes: 2

Views: 79

Answers (2)

Henry Ecker
Henry Ecker

Reputation: 35676

You can re-index by odd and even rows.

Calculate run_diff and win_loss for both teams in a single row.

Then return your data to its initial order by stacking and re-indexing.

import pandas as pd

df = pd.DataFrame({'Date': [402, 402, 402, 402, 403, 403, 404, 404],
                   'Team': ['SFO', 'ARI', 'CUB', 'STL', 'NYY', 'SEA', 'OAK', 'LAA'],
                   'Final': [4, 6, 2, 5, 7, 2, 1, 2]})

# Re Index Based on Odd And Even Rows
new_df = df.set_index([df.index // 2, df.index % 2]).unstack()

# Calculate Run Diff in Both Directions
new_df['Run_Diff', 0] = new_df['Final', 0] - new_df['Final', 1]
new_df['Run_Diff', 1] = new_df['Final', 1] - new_df['Final', 0]
# Calculate Win Loss in Both Directions
new_df['Win_Loss', 0] = (new_df['Run_Diff', 0] > 0).astype(int)
new_df['Win_Loss', 1] = (new_df['Run_Diff', 1] > 0).astype(int)

# Remove Multi Index, Change Column Order
new_df = (
    new_df.stack(1)
        .reset_index(drop=True)[['Date', 'Team', 'Final', 'Win_Loss', 'Run_Diff']]
)
print(new_df)

Output:

   Date Team  Final  Win_Loss  Run_Diff
0   402  SFO      4         0        -2
1   402  ARI      6         1         2
2   402  CUB      2         0        -3
3   402  STL      5         1         3
4   403  NYY      7         1         5
5   403  SEA      2         0        -5
6   404  OAK      1         0        -1
7   404  LAA      2         1         1

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195553

I'd chose different approach. Looking at your data, I'd first transform your dataframe:

new_df = pd.DataFrame(
    {
        "Date": df["Date"].iloc[::2].values,
        "Team1": df["Team"].iloc[::2].values,
        "Team2": df["Team"].iloc[1::2].values,
        "Final1": df["Final"].iloc[::2].values,
        "Final2": df["Final"].iloc[1::2].values,
    }
)

This creates this new_df:

   Date Team1 Team2  Final1  Final2
0   402   SFO   ARI       4       6
1   402   CUB   STL       2       5
2   403   NYY   SEA       7       2
3   404   OAK   LAA       1       2

Then it's easy:

new_df["run_diff"] = new_df["Final1"] - new_df["Final2"]
new_df["win_loss"] = (new_df["run_diff"] < 0).astype(int)
print(new_df)

Prints:

   Date Team1 Team2  Final1  Final2  run_diff  win_loss
0   402   SFO   ARI       4       6        -2         1
1   402   CUB   STL       2       5        -3         1
2   403   NYY   SEA       7       2         5         0
3   404   OAK   LAA       1       2        -1         1

Upvotes: 1

Related Questions