Reputation: 165
I have the following dataframe and would like to create a column at the end called "dup" showing the number of times the row shows up based on the "Seasons" and "Actor" columns. Ideally the dup column would look like this:
Name Seasons Actor dup
0 Stranger Things 3 Millie 1
1 Game of Thrones 8 Emilia 1
2 La Casa De Papel 4 Sergio 1
3 Westworld 3 Evan Rachel 1
4 Stranger Things 3 Millie 2
5 La Casa De Papel 4 Sergio 1
Upvotes: 0
Views: 925
Reputation: 2069
This should do what you need:
df['dup'] = df.groupby(['Seasons', 'Actor']).cumcount() + 1
Output:
Name Seasons Actor dup
0 Stranger Things 3 Millie 1
1 Game of Thrones 8 Emilia 1
2 La Casa De Papel 4 Sergio 1
3 Westworld 3 Evan Rachel 1
4 Stranger Things 3 Millie 2
5 La Casa De Papel 4 Sergio 2
As Scott Boston mentioned, according to your criteria the last row should also be 2
in the dup
column.
Here is a similar post that can provide you more information. SQL-like window functions in PANDAS
Upvotes: 2