Reputation: 263
my dataframe looks like this
ID Stage Recipe
1 ABC X
1 ABC D
1 ABC D
1 ABC D
2 ABC X
2 ABC X
1 BCD Y
1 CDE Y
What I want, is to count Recipes that occur more than once in a particular stage (which is equal to a rework) for a particular ID.
So Output for this dataframe should look like:
ID Rework_steps
1 2
2 1
Gladly appreciate any help!
Upvotes: 2
Views: 59
Reputation: 263
I actually solved it myself now, by simplying counting the duplicates
duplicates=`df[df.duplicated(subset=['ID','Stage','Recipe'])]
and
duplicates.groupby('ID').size()
Upvotes: 1
Reputation: 153460
Try using what @RubenHelslott suggests this way:
(df.groupby(['ID', 'Recipe'])['Stage'].nunique() == 1).sum(level=0)
Output:
ID
1 2
2 1
Name: Stage, dtype: int64
Details:
Count the number of unique Stage in each ID and Recipe using groupby
and nunique
, then see if that number is equal to 1 which returns a multiindex boolean series. Now, we can use sum
with parameter level=0
to sum Trues to the outer most index level 'ID'.
Upvotes: 2