RobbieBeats
RobbieBeats

Reputation: 59

How can I add an id column based on unique combinations of other columns?

I'm using NBA play by play data that has player ID numbers for each defensive player and each offensive player. I'd like to add a column for each lineup combination, so a deflinid and offlinid.

Here's the code for an example of the dataset:

df = pd.DataFrame(np.array([[1,2,3,4,5,11,12,13,14,15,5,5],[1,2,3,4,6,11,12,13,14,15,4,4],[2,3,4,5,6,11,12,13,14,15,3,5],[11,12,13,14,15,1,2,3,4,5,5,5],[11,12,13,14,15,1,2,3,4,6,10,10],[11,12,13,14,16,2,3,4,5,6,5,5]]),columns=['offplayer1','offplayer2','offplayer3','offplayer4','offplayer5','defplayer1','defplayer2','defplayer3','defplayer4','defplayer5','possessions','points'])

Then from there, I'd like to create columns with lineup IDs for each unique 5-player ID combination.

Here's an example of the 2 columns I'd like to be generated and added to df based on the example df above:

offlinid  deflinid
       1         4
       2         4
       3         4
       4         1
       4         2
       5         3

Thanks in advance!

Upvotes: 0

Views: 195

Answers (1)

Andy L.
Andy L.

Reputation: 25269

Using pd.concat to stack offplayerX columns on top of defplayerX columns. Next, agg every row to tuples and call rank and unstack

offcols = ['offplayer1', 'offplayer2', 'offplayer3', 'offplayer4', 'offplayer5']
defcols = ['defplayer1', 'defplayer2', 'defplayer3', 'defplayer4', 'defplayer5']

df1 = pd.concat([df[offcols], df[defcols].rename(columns=dict(zip(defcols, offcols)))], 
                 keys=['offlinid',  'deflinid'])

df_final = df1.agg(tuple, axis=1).rank(method='dense').unstack(0)

Out[92]:
   offlinid  deflinid
0       1.0       4.0
1       2.0       4.0
2       3.0       4.0
3       4.0       1.0
4       4.0       2.0
5       5.0       3.0

Upvotes: 1

Related Questions