Pandas manipulation with GroupBy

Question

I have a very large table of data containing play_id, position and frame columns.

play_id - refers to a particular play
position - either A or B (with 3 total players (rows) for each given play - any combination of As and Bs)
frame - time frame (e.g. imagine freeze-frame every 1 second)

The following shows example data (simplified for clarity):

play_id	position	frame
1	A	1
1	A	1
1	B	1
1	A	2
1	A	2
1	B	2
2	A	1
2	B	1
2	B	1
2	A	2
2	B	2
2	B	2

I want to count the number of players in each position for each play_id (they are consistent across frames for a given play_id) and append a number to each to make them unique.

This would result in the following:

play_id	position	frame
1	A_1	1
1	A_2	1
1	B_1	1
1	A_1	2
1	A_2	2
1	B_1	2
2	A_1	1
2	B_1	1
2	B_2	1
2	A_1	2
2	B_1	2
2	B_2	2

In reality I have 7 different positions, 25 frames and around 500,000 play_ids (along with many more columns).

How can I achieve this in an efficient manner? I assume groupby should be used, but I can't figure out how.

yatu · Accepted Answer

You could groupby.cumcount, to group by all columns and add a count for the positions:

df['position'] = df.position.str.cat(df.groupby(['play_id','frame','position']).position
                                       .cumcount().add(1).astype(str), 
                                     sep='_')

print(df)

    play_id position  frame
0         1      A_1      1
1         1      A_2      1
2         1      B_1      1
3         1      A_1      2
4         1      A_2      2
5         1      B_1      2
6         2      A_1      1
7         2      B_1      1
8         2      B_2      1
9         2      A_1      2
10        2      B_1      2
11        2      B_2      2

Pandas manipulation with GroupBy

Answers (1)

Related Questions

play_id	position	frame
1	A_1	1
1	A_2	1
1	B_1	1
1	A_1	2
1	A_2	2
1	B_1	2
2	A_1	1
2	B_1	1
2	B_2	1
2	A_1	2
2	B_1	2
2	B_2	2

play_id	position	frame
1	A_1	1
1	A_2	1
1	B_1	1
1	A_1	2
1	A_2	2
1	B_1	2
2	A_1	1
2	B_1	1
2	B_2	1
2	A_1	2
2	B_1	2
2	B_2	2

play_id	position	frame
1	A_1	1
1	A_2	1
1	B_1	1
1	A_1	2
1	A_2	2
1	B_1	2
2	A_1	1
2	B_1	1
2	B_2	1
2	A_1	2
2	B_1	2
2	B_2	2