Michelle
Michelle

Reputation: 263

Count duplicates in specific column

my dataframe looks like this

ID      Stage      Recipe
1       ABC        X
1       ABC        D
1       ABC        D
1       ABC        D
2       ABC        X
2       ABC        X     
1       BCD        Y
1       CDE        Y

What I want, is to count Recipes that occur more than once in a particular stage (which is equal to a rework) for a particular ID.

So Output for this dataframe should look like:

ID     Rework_steps
1      2
2      1

Gladly appreciate any help!

Upvotes: 2

Views: 59

Answers (2)

Michelle
Michelle

Reputation: 263

I actually solved it myself now, by simplying counting the duplicates duplicates=`df[df.duplicated(subset=['ID','Stage','Recipe'])] and duplicates.groupby('ID').size()

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153460

Try using what @RubenHelslott suggests this way:

(df.groupby(['ID', 'Recipe'])['Stage'].nunique() == 1).sum(level=0)

Output:

ID
1    2
2    1
Name: Stage, dtype: int64

Details:

Count the number of unique Stage in each ID and Recipe using groupby and nunique, then see if that number is equal to 1 which returns a multiindex boolean series. Now, we can use sum with parameter level=0 to sum Trues to the outer most index level 'ID'.

Upvotes: 2

Related Questions