Reputation: 61
I have an Excel spreadsheet that calculates the winning/losing streak of an NFL dataset using the Team and Win/Loss columns, and I'd like to put this into a panda dataframe as part of my automation project (just a hobby). Here's a sample of the data in pandas:
import pandas as pd
df = pd.DataFrame({'DB Date': {0: 20190905,
1: 20190905,
2: 20190915,
3: 20190915,
4: 20190922,
5: 20190923},
'Week': {0: 1, 1: 1, 2: 2, 3: 2, 4: 3, 5: 3},
'Team': {0: 'Bears',
1: 'Packers',
2: 'Bears',
3: 'Packers',
4: 'Packers',
5: 'Bears'},
'Opponent': {0: 'Packers',
1: 'Bears',
2: 'Broncos',
3: 'Vikings',
4: 'Broncos',
5: 'Redskins'},
'Home/Away': {0: 'home',
1: 'away',
2: 'away',
3: 'home',
4: 'home',
5: 'away'},
'Team Score': {0: 3, 1: 10, 2: 16, 3: 21, 4: 27, 5: 31},
'Opp Score': {0: 10, 1: 3, 2: 14, 3: 16, 4: 16, 5: 15},
'Win/Loss': {0: 'Loss', 1: 'Win', 2: 'Win', 3: 'Win', 4: 'Win', 5: 'Win'}})
I'm trying to add a new column that calculates the winning/losing streak to this dataframe. Here's what I have from my Excel file:
DB Date Week Team Opponent Team Score Opp Score Win/Loss
20190905 1 Bears Packers 3 10 L1
20190905 1 Packers Bears 10 3 W1
20190915 2 Bears Broncos 16 14 W1
20190915 2 Packers Vikings 21 16 W2
20190922 3 Packers Broncos 27 16 W3
20190923 3 Bears Redskins 31 15 W2
20190926 4 Packers Eagles 27 34 L1
20190929 4 Bears Vikings 16 6 W3
20191006 5 Packers Cowboys 34 24 W1
20191006 5 Bears Raiders 21 24 L1
As a total beginner to Python, I have no clue how to create a function that will create this "Win Streak" column. I'd greatly appreciate any help with this!
Edit:
{'DB Date': {0: 20190905,
1: 20190905,
2: 20190915,
3: 20190915,
4: 20190922,
5: 20190923,
6: 20190926,
7: 20190929,
8: 20191006,
9: 20191006,
10: 20191014,
11: 20191020,
12: 20191020},
'Week': {0: 1,
1: 1,
2: 2,
3: 2,
4: 3,
5: 3,
6: 4,
7: 4,
8: 5,
9: 5,
10: 6,
11: 7,
12: 7},
'Team': {0: 'Bears',
1: 'Packers',
2: 'Bears',
3: 'Packers',
4: 'Packers',
5: 'Bears',
6: 'Packers',
7: 'Bears',
8: 'Packers',
9: 'Bears',
10: 'Packers',
11: 'Bears',
12: 'Packers'},
'Opponent': {0: 'Packers',
1: 'Bears',
2: 'Broncos',
3: 'Vikings',
4: 'Broncos',
5: 'Redskins',
6: 'Eagles',
7: 'Vikings',
8: 'Cowboys',
9: 'Raiders',
10: 'Lions',
11: 'Saints',
12: 'Raiders'},
'Home/Away': {0: 'home',
1: 'away',
2: 'away',
3: 'home',
4: 'home',
5: 'away',
6: 'home',
7: 'home',
8: 'away',
9: 'neutral',
10: 'home',
11: 'home',
12: 'home'},
'Team Score': {0: 3,
1: 10,
2: 16,
3: 21,
4: 27,
5: 31,
6: 27,
7: 16,
8: 34,
9: 21,
10: 23,
11: 25,
12: 42},
'Opp Score': {0: 10,
1: 3,
2: 14,
3: 16,
4: 16,
5: 15,
6: 34,
7: 6,
8: 24,
9: 24,
10: 22,
11: 36,
12: 24},
'Win/Loss': {0: 'Loss',
1: 'Win',
2: 'Win',
3: 'Win',
4: 'Win',
5: 'Win',
6: 'Loss',
7: 'Win',
8: 'Win',
9: 'Loss',
10: 'Win',
11: 'Loss',
12: 'Win'}}
Upvotes: 2
Views: 1905
Reputation: 1002
Please try this:
import numpy as np
import pandas as pd
pd.set_option('mode.chained_assignment', None)
def make_streak(S1):
''' This function takes the values of Win/Loss column as input and returns a list of streak values
eg: input list : ['Loss', 'Win', 'Win', 'Win', 'Loss', 'Loss']
output list : ['L1', 'W1', 'W2', 'W3', 'L1', 'L2'] '''
S2 = []
for i in range(len(S1)):
if i==0:
S2.append(S1[i][0]+'1');continue
if S1[i] != S1[i-1]:
S2.append(S1[i][0]+'1')
if S1[i] == S1[i-1]:
S2.append(S1[i][0]+str(int(S2[-1][-1])+1))
return S2
def create_col(row):
''' This function takes every row of df as input, and returns the value for the new column:'Win/Loss Streak'(to be created)'''
df_temp = df[(df['Team'] == row['Team'])] #Create a temporary dataframe with passed in row's team name
df_temp['Streak'] = make_streak(df_temp['Win/Loss'].tolist())
return (df_temp[ df_temp['row_num'] == row['row_num']]['Streak'].values[0]) #Return df_temp's last column value which matches passed in row's row number
df['row_num'] = np.arange(1,len(df)+1) #Creates a column 'row_num' to identify the rows uniquely
df['Win Streak'] = df.apply(create_col,axis=1) #Call create_col function and create 'Win Streak' column
df.drop(columns = 'row_num', inplace = True) #Drop the column 'row_num'
df
Upvotes: 1