June Smith
June Smith

Reputation: 165

How to create a duplicate flag (column) that counts duplicate rows based on two columns?

I have the following dataframe and would like to create a column at the end called "dup" showing the number of times the row shows up based on the "Seasons" and "Actor" columns. Ideally the dup column would look like this:

               Name  Seasons        Actor   dup
0   Stranger Things        3       Millie     1
1   Game of Thrones        8       Emilia     1
2  La Casa De Papel        4       Sergio     1     
3         Westworld        3  Evan Rachel     1
4   Stranger Things        3       Millie     2
5  La Casa De Papel        4       Sergio     1

Upvotes: 0

Views: 925

Answers (1)

Kurt Kline
Kurt Kline

Reputation: 2069

This should do what you need:

df['dup'] = df.groupby(['Seasons', 'Actor']).cumcount() + 1

Output:

               Name  Seasons        Actor  dup
0   Stranger Things        3       Millie    1
1   Game of Thrones        8       Emilia    1
2  La Casa De Papel        4       Sergio    1
3         Westworld        3  Evan Rachel    1
4   Stranger Things        3       Millie    2
5  La Casa De Papel        4       Sergio    2

As Scott Boston mentioned, according to your criteria the last row should also be 2 in the dup column.

Here is a similar post that can provide you more information. SQL-like window functions in PANDAS

Upvotes: 2

Related Questions