BenjaminClayton
BenjaminClayton

Reputation: 37

Pandas manipulation with GroupBy

I have a very large table of data containing play_id, position and frame columns.

The following shows example data (simplified for clarity):

play_id position frame
1 A 1
1 A 1
1 B 1
1 A 2
1 A 2
1 B 2
2 A 1
2 B 1
2 B 1
2 A 2
2 B 2
2 B 2

I want to count the number of players in each position for each play_id (they are consistent across frames for a given play_id) and append a number to each to make them unique.

This would result in the following:

play_id position frame
1 A_1 1
1 A_2 1
1 B_1 1
1 A_1 2
1 A_2 2
1 B_1 2
2 A_1 1
2 B_1 1
2 B_2 1
2 A_1 2
2 B_1 2
2 B_2 2

In reality I have 7 different positions, 25 frames and around 500,000 play_ids (along with many more columns).

How can I achieve this in an efficient manner? I assume groupby should be used, but I can't figure out how.

Upvotes: 0

Views: 32

Answers (1)

yatu
yatu

Reputation: 88305

You could groupby.cumcount, to group by all columns and add a count for the positions:

df['position'] = df.position.str.cat(df.groupby(['play_id','frame','position']).position
                                       .cumcount().add(1).astype(str), 
                                     sep='_')

print(df)

    play_id position  frame
0         1      A_1      1
1         1      A_2      1
2         1      B_1      1
3         1      A_1      2
4         1      A_2      2
5         1      B_1      2
6         2      A_1      1
7         2      B_1      1
8         2      B_2      1
9         2      A_1      2
10        2      B_1      2
11        2      B_2      2

Upvotes: 1

Related Questions