Reputation: 59
I'm using NBA play by play data that has player ID numbers for each defensive player and each offensive player. I'd like to add a column for each lineup combination, so a deflinid and offlinid.
Here's the code for an example of the dataset:
df = pd.DataFrame(np.array([[1,2,3,4,5,11,12,13,14,15,5,5],[1,2,3,4,6,11,12,13,14,15,4,4],[2,3,4,5,6,11,12,13,14,15,3,5],[11,12,13,14,15,1,2,3,4,5,5,5],[11,12,13,14,15,1,2,3,4,6,10,10],[11,12,13,14,16,2,3,4,5,6,5,5]]),columns=['offplayer1','offplayer2','offplayer3','offplayer4','offplayer5','defplayer1','defplayer2','defplayer3','defplayer4','defplayer5','possessions','points'])
Then from there, I'd like to create columns with lineup IDs for each unique 5-player ID combination.
Here's an example of the 2 columns I'd like to be generated and added to df based on the example df above:
offlinid deflinid
1 4
2 4
3 4
4 1
4 2
5 3
Thanks in advance!
Upvotes: 0
Views: 195
Reputation: 25269
Using pd.concat
to stack offplayerX
columns on top of defplayerX
columns. Next, agg
every row to tuples and call rank
and unstack
offcols = ['offplayer1', 'offplayer2', 'offplayer3', 'offplayer4', 'offplayer5']
defcols = ['defplayer1', 'defplayer2', 'defplayer3', 'defplayer4', 'defplayer5']
df1 = pd.concat([df[offcols], df[defcols].rename(columns=dict(zip(defcols, offcols)))],
keys=['offlinid', 'deflinid'])
df_final = df1.agg(tuple, axis=1).rank(method='dense').unstack(0)
Out[92]:
offlinid deflinid
0 1.0 4.0
1 2.0 4.0
2 3.0 4.0
3 4.0 1.0
4 4.0 2.0
5 5.0 3.0
Upvotes: 1