Reputation: 59
I have the following dataframe:
df = pd.DataFrame([
('A', 'A', 'Cat'),
('A', 'A', 'Dog'),
('A', 'A', 'Cat'),
('A', 'B', 'Dog'),
('B', 'B', 'Rat'),
('B', 'C', 'Cat'),
('B', 'C', 'Cat')
], columns=['id', 'group', 'Animal'])
I want to group it by id
and group
and calculate the occurrence of Cat
in each group. An example output will be:
[2, 0, 0, 2]
2 cat
in group AA
,
0 cat
in group AB
,
0 cat
in group BB
,
2 cat
in group BC
Can anyone help? Thanks!
Upvotes: 4
Views: 21466
Reputation: 557
Your question is very similar (if not duplicate) to this question.
When you want to perform some aggregation which are not part of pandas (e.g. sum
, max
, min
) then you should use the apply method.
Please note that apply
can be slower, so don't start using everywhere.
Now, for your question, you start with grouping by:
df.groupby(by=["id", "group"])
Then, you select the Animal
column from the groups:
df.groupby(by=["id", "group"])['Animal']
At this point, you have values from the Animal
column for each different group, ready for any aggregation.
Let's define a function to count Cat
s in a pandas Series
:
def count_cat(series):
return (series == 'Cat').sum()
All you have to do now, is to pass this function to the apply
method, like the following:
df.groupby(by=["id", "group"])['Animal'].apply(count_cat).reset_index(name='count_cat')
And the result would be:
id | group | count_cat | |
---|---|---|---|
0 | A | A | 2 |
1 | A | B | 0 |
2 | B | B | 0 |
3 | B | C | 2 |
Upvotes: 5
Reputation: 1
if 0 values are not important than you can try:
list(df[df['Animal']=='Cat'].groupby(['id', 'group']).count()['Animal'])
Sorry for my English
Upvotes: 0
Reputation: 403218
You'll need mask
+ groupby
:
df['Animal'] = df['Animal'].mask(df['Animal'].ne('Cat'))
df.groupby(['id', 'group'])['Animal'].count().tolist()
[2, 0, 0, 2]
Upvotes: 9
Reputation: 323396
You can using value_counts
df.groupby(['id','group']).Animal.value_counts().unstack(fill_value=0).loc[:,'Cat'].tolist()
Out[144]: [2, 0, 0, 2]
More info return before the last step .loc
df.groupby(['id','group']).Animal.value_counts().unstack(fill_value=0)
Out[145]:
Animal Cat Dog Rat
id group
A A 2 1 0
B 0 1 0
B B 0 0 1
C 2 0 0
Upvotes: 7