Reputation: 327
I have the following extract of dataframe that represents the play by play of a basketball match:
import pandas as pd
data = {'actionNumber':
[669,668,667,666,665,663,662,661,660,659,657,656,655], 'gt':['03:12','03:12','03:18','03:18','03:36','03:48','03:48','03:48','03:48','03:51','03:51','03:55','03:58'], 'actionType':['steal','turnover','assist','3pt','2pt','freethrow','freethrow','foulon','foul','steal','turnover','rebound','2pt'], 'player':['S. WOHLWEND','F. DELLA MEA','Z. RIAUKA','A. VIOTTI','J. VIANA','A. VIOTTI','A. VIOTTI','A. VIOTTI','E. GONZALEZ','A. VIOTTI','A. ARISTIMUNO','P. BLACKWELL','A. VIOTTI'], 'tno':['1', '2','1','1','2','1','1','1','2','1','2','2','1']}
df = pd.DataFrame(data)
df
I want to determine the time that each team has a posession. In the tno column you can see basically which team has the posession, so my purpose is to iterate over pandas, starting from the last row, and mark it as posession one, after that going to the next and mark all the rows where tno has a 2 as posession 2, after that, when the tno changes to 1, mark it as posession 3, and repeat the action until I count and mark all the different posessions; so finally I can obtain the mean of each posession per quarter, per team, to see which team has faster posessions, etc.
I have already tried with a for loop and I can do it in this way but I am trying to learn pandas and in every part I see that loops should be avoided, I have done with success same parts using shifts, but I don't know how to use shift withou knowing how many plays is it going to last each posession.
I am expecting the following:
data = {'actionNumber':
[669,668,667,666,665,663,662,661,660,659,657,656,655], 'gt':['03:12','03:12','03:18','03:18','03:36','03:48','03:48','03:48','03:48','03:51','03:51','03:55','03:58'], 'actionType':['steal','turnover','assist','3pt','2pt','freethrow','freethrow','foulon','foul','steal','turnover','rebound','2pt'], 'player':['S. WOHLWEND','F. DELLA MEA','Z. RIAUKA','A. VIOTTI','J. VIANA','A. VIOTTI','A. VIOTTI','A. VIOTTI','E. GONZALEZ','A. VIOTTI','A. ARISTIMUNO','P. BLACKWELL','A. VIOTTI'], 'tno':['1','2','1','1','2','1','1','1','2','1','2','2','1'],'pn': ['7','6','5','5','4','3','3','3','','3','2','2','1']}
df = pd.DataFrame(data)
df
where pn is the posession number (ignore the row 57b it is not important for this problem), so I can obtain the last of each posession, how it ended, etc.
Upvotes: 2
Views: 230
Reputation: 7353
Here is another attempt to break down the steps.
df = df.sort_values('actionNumber', ascending=True)
tno = df.tno.to_list()
# check if fouls happened
isfoul = (df.actionType=='foul').astype(int).to_list()
tnox = [int(x)-int(y) for x,y in zip(tno,isfoul)]
# calculate when a swap happens
swap_counter = [1] + [np.abs(int(x) - int(y)) for x, y in zip(tnox[:-1], tnox[1:])]
# evaluate a cumulative sum of all swaps
df['pn'] = np.array(swap_counter).cumsum()*np.abs(1-np.array(isfoul))
# adjust column pn: (where rows with actionType='foul') leave cell value empty
df.pn.loc[df.actionType=='foul'] = ''
df = df.sort_values('actionNumber', ascending=False)
df
Upvotes: 1